BioMed Research International: Bioinformatics The latest articles from Hindawi Publishing Corporation © 2016 , Hindawi Publishing Corporation . All rights reserved. A Comparative Study of Land Cover Classification by Using Multispectral and Texture Data Wed, 08 Jun 2016 07:50:20 +0000 The main objective of this study is to find out the importance of machine vision approach for the classification of five types of land cover data such as bare land, desert rangeland, green pasture, fertile cultivated land, and Sutlej river land. A novel spectra-statistical framework is designed to classify the subjective land cover data types accurately. Multispectral data of these land covers were acquired by using a handheld device named multispectral radiometer in the form of five spectral bands (blue, green, red, near infrared, and shortwave infrared) while texture data were acquired with a digital camera by the transformation of acquired images into 229 texture features for each image. The most discriminant 30 features of each image were obtained by integrating the three statistical features selection techniques such as Fisher, Probability of Error plus Average Correlation, and Mutual Information (F + PA + MI). Selected texture data clustering was verified by nonlinear discriminant analysis while linear discriminant analysis approach was applied for multispectral data. For classification, the texture and multispectral data were deployed to artificial neural network (ANN: n-class). By implementing a cross validation method (80-20), we received an accuracy of 91.332% for texture data and 96.40% for multispectral data, respectively. Salman Qadri, Dost Muhammad Khan, Farooq Ahmad, Syed Furqan Qadri, Masroor Ellahi Babar, Muhammad Shahid, Muzammil Ul-Rehman, Abdul Razzaq, Syed Shah Muhammad, Muhammad Fahad, Sarfraz Ahmad, Muhammad Tariq Pervez, Nasir Naveed, Naeem Aslam, Mutiullah Jamil, Ejaz Ahmad Rehmani, Nazir Ahmad, and Naeem Akhtar Khan Copyright © 2016 Salman Qadri et al. All rights reserved. Finding Clocks in Genes: A Bayesian Approach to Estimate Periodicity Thu, 02 Jun 2016 13:49:37 +0000 Identification of rhythmic gene expression from metabolic cycles to circadian rhythms is crucial for understanding the gene regulatory networks and functions of these biological processes. Recently, two algorithms, JTK_CYCLE and ARSER, have been developed to estimate periodicity of rhythmic gene expression. JTK_CYCLE performs well for long or less noisy time series, while ARSER performs well for detecting a single rhythmic category. However, observing gene expression at high temporal resolution is not always feasible, and many scientists are interested in exploring both ultradian and circadian rhythmic categories simultaneously. In this paper, a new algorithm, named autoregressive Bayesian spectral regression (ABSR), is proposed. It estimates the period of time-course experimental data and classifies gene expression profiles into multiple rhythmic categories simultaneously. Through the simulation studies, it is shown that ABSR substantially improves the accuracy of periodicity estimation and clustering of rhythmic categories as compared to JTK_CYCLE and ARSER for the data with low temporal resolution. Moreover, ABSR is insensitive to rhythmic patterns. This new scheme is applied to existing time-course mouse liver data to estimate period of rhythms and classify the genes into ultradian, circadian, and arrhythmic categories. It is observed that 49.2% of the circadian profiles detected by JTK_CYCLE with 1-hour resolution are also detected by ABSR with only 4-hour resolution. Yan Ren, Christian I. Hong, Sookkyung Lim, and Seongho Song Copyright © 2016 Yan Ren et al. All rights reserved. Therapeutic Effects of CUR-Activated Human Umbilical Cord Mesenchymal Stem Cells on 1-Methyl-4-phenylpyridine-Induced Parkinson’s Disease Cell Model Tue, 31 May 2016 07:13:19 +0000 The purpose of this study is to evaluate the therapeutic effects of human umbilical cord-derived mesenchymal stem cells (hUC-MSC) activated by curcumin (CUR) on PC12 cells induced by 1-methyl-4-phenylpyridinium ion (MPP+), a cell model of Parkinson’s disease (PD). The supernatant of hUC-MSC and hUC-MSC activated by 5 µmol/L CUR (hUC-MSC-CUR) were collected in accordance with the same concentration. The cell proliferation and differentiation potential to dopaminergic neuronal cells and antioxidation were observed in PC12 cells after being treated with the above two supernatants and 5 µmol/L CUR. The results showed that the hUC-MSC-CUR could more obviously promote the proliferation and the expression of tyrosine hydroxylase (TH) and microtubule associated protein-2 (MAP2) and significantly decreased the expression of nitric oxide (NO) and inducible nitric oxide synthase (iNOS) in PC12 cells. Furtherly, cytokines detection gave a clue that the expression of IL-6, IL-10, and NGF was significantly higher in the group treated with the hUC-MSC-CUR compared to those of other two groups. Therefore, the hUC-MSC-CUR may be a potential strategy to promote the proliferation and differentiation of PD cell model, therefore providing new insights into a novel therapeutic approach in PD. Li Jinfeng, Wang Yunliang, Liu Xinshan, Wang Yutong, Wang Shanshan, Xue Peng, Yang Xiaopeng, Xu Zhixiu, Lu Qingshan, Yin Honglei, Cao Xia, Wang Hongwei, and Cao Bingzhen Copyright © 2016 Li Jinfeng et al. All rights reserved. The Effects of Real-Time Interactive Multimedia Teleradiology System Tue, 17 May 2016 14:18:35 +0000 This study describes the design of a real-time interactive multimedia teleradiology system and assesses how the system is used by referring physicians in point-of-care situations and supports or hinders aspects of physician-radiologist interaction. We developed a real-time multimedia teleradiology management system that automates the transfer of images and radiologists’ reports and surveyed physicians to triangulate the findings and to verify the realism and results of the experiment. The web-based survey was delivered to 150 physicians from a range of specialties. The survey was completed by 72% of physicians. Data showed a correlation between rich interactivity, satisfaction, and effectiveness. The results of our experiments suggest that real-time multimedia teleradiology systems are valued by referring physicians and may have the potential for enhancing their practice and improving patient care and highlight the critical role of multimedia technologies to provide real-time multimode interactivity in current medical care. Lilac Al-Safadi Copyright © 2016 Lilac Al-Safadi. All rights reserved. Impacts of Nonsynonymous Single Nucleotide Polymorphisms of Adiponectin Receptor 1 Gene on Corresponding Protein Stability: A Computational Approach Sun, 15 May 2016 10:00:10 +0000 Despite the reported association of adiponectin receptor 1 (ADIPOR1) gene mutations with vulnerability to several human metabolic diseases, there is lack of computational analysis on the functional and structural impacts of single nucleotide polymorphisms (SNPs) of the human ADIPOR1 at protein level. Therefore, sequence- and structure-based computational tools were employed in this study to functionally and structurally characterize the coding nsSNPs of ADIPOR1 gene listed in the dbSNP database. Our in silico analysis by SIFT, nsSNPAnalyzer, PolyPhen-2, Fathmm, I-Mutant 2.0, SNPs&GO, PhD-SNP, PANTHER, and SNPeffect tools identified the nsSNPs with distorting functional impacts, namely, rs765425383 (A348G), rs752071352 (H341Y), rs759555652 (R324L), rs200326086 (L224F), and rs766267373 (L143P) from 74 nsSNPs of ADIPOR1 gene. Finally the aforementioned five deleterious nsSNPs were introduced using Swiss-PDB Viewer package within the X-ray crystal structure of ADIPOR1 protein, and changes in free energy for these mutations were computed. Although increased free energy was observed for all the mutants, the nsSNP H341Y caused the highest energy increase amongst all. RMSD and TM scores predicted that mutants were structurally similar to wild type protein. Our analyses suggested that the aforementioned variants especially H341Y could directly or indirectly destabilize the amino acid interactions and hydrogen bonding networks of ADIPOR1. Md. Abu Saleh, Md. Solayman, Sudip Paul, Moumoni Saha, Md. Ibrahim Khalil, and Siew Hua Gan Copyright © 2016 Md. Abu Saleh et al. All rights reserved. Detecting Susceptibility to Breast Cancer with SNP-SNP Interaction Using BPSOHS and Emotional Neural Networks Wed, 11 May 2016 08:24:34 +0000 Studies for the association between diseases and informative single nucleotide polymorphisms (SNPs) have received great attention. However, most of them just use the whole set of useful SNPs and fail to consider the SNP-SNP interactions, while these interactions have already been proven in biology experiments. In this paper, we use a binary particle swarm optimization with hierarchical structure (BPSOHS) algorithm to improve the effective of PSO for the identification of the SNP-SNP interactions. Furthermore, in order to use these SNP interactions in the susceptibility analysis, we propose an emotional neural network (ENN) to treat SNP interactions as emotional tendency. Different from the normal architecture, just as the emotional brain, this architecture provides a specific path to treat the emotional value, by which the SNP interactions can be considered more quickly and directly. The ENN helps us use the prior knowledge about the SNP interactions and other influence factors together. Finally, the experimental results prove that the proposed BPSOHS_ENN algorithm can detect the informative SNP-SNP interaction and predict the breast cancer risk with a much higher accuracy than existing methods. Xiao Wang, Qinke Peng, and Yue Fan Copyright © 2016 Xiao Wang et al. All rights reserved. The Use of Protein-Protein Interactions for the Analysis of the Associations between PM2.5 and Some Diseases Sun, 08 May 2016 11:57:09 +0000 Nowadays, pollution levels are rapidly increasing all over the world. One of the most important pollutants is PM2.5. It is known that the pollution environment may cause several problems, such as greenhouse effect and acid rain. Among them, the most important problem is that pollutants can induce a number of serious diseases. Some studies have reported that PM2.5 is an important etiologic factor for lung cancer. In this study, we extensively investigate the associations between PM2.5 and 22 disease classes recommended by Goh et al., such as respiratory diseases, cardiovascular diseases, and gastrointestinal diseases. The protein-protein interactions were used to measure the linkage between disease genes and genes that have been reported to be modulated by PM2.5. The results suggest that some diseases, such as diseases related to ear, nose, and throat and gastrointestinal, nutritional, renal, and cardiovascular diseases, are influenced by PM2.5 and some evidences were provided to confirm our results. For example, a total of 18 genes related to cardiovascular diseases are identified to be closely related to PM2.5, and cardiovascular disease relevant gene DSP is significantly related to PM2.5 gene JUP. Qing Zhang, Pei-Wei Zhang, and Yu-Dong Cai Copyright © 2016 Qing Zhang et al. All rights reserved. Bioinformatics Applications in Life Sciences and Technologies Wed, 04 May 2016 12:59:07 +0000 Sílvia A. Sousa, Jorge H. Leitão, Raul C. Martins, João M. Sanches, Jasjit S. Suri, and Alejandro Giorgetti Copyright © 2016 Sílvia A. Sousa et al. All rights reserved. Differential Proteomics Analysis of Colonic Tissues in Patients of Slow Transit Constipation Sat, 30 Apr 2016 14:08:05 +0000 Objective. To investigate and screen the different expression of proteins in STC and normal group with a comparative proteomic approach. Methods. Two-dimensional electrophoresis was applied to separate the proteins in specimens from both 5 STC patients and 5 normal controls. The proteins with statistically significant differential expression between two groups were identified by computer aided image analysis and matrix assisted laser desorption ionization tandem time of flight mass spectrometry (MALDI-TOF-MS). Results. A total of 239 protein spots were identified in the average gel of the normal control and 215 in patients with STC. A total of 197 protein spots were matched and the mean matching rate was 82%. There were 14 protein spots which were expressed with statistically significant differences from others. Of those 14 protein spots, the expression of 12 spots increased markedly, while that of 2 spots decreased significantly. Conclusion. The proteomics expression in colonic specimens of STC patients is statistically significantly different from that of normal control, which may be associated with the pathogenesis of STC. Songlin Wan, Weicheng Liu, Cuiping Tian, Xianghai Ren, Zhao Ding, Qun Qian, Congqing Jiang, and Yunhua Wu Copyright © 2016 Songlin Wan et al. All rights reserved. Discovery of Azurin-Like Anticancer Bacteriocins from Human Gut Microbiome through Homology Modeling and Molecular Docking against the Tumor Suppressor p53 Sat, 30 Apr 2016 13:22:47 +0000 Azurin from Pseudomonas aeruginosa is known anticancer bacteriocin, which can specifically penetrate human cancer cells and induce apoptosis. We hypothesized that pathogenic and commensal bacteria with long term residence in human body can produce azurin-like bacteriocins as a weapon against the invasion of cancers. In our previous work, putative bacteriocins have been screened from complete genomes of 66 dominant bacteria species in human gut microbiota and subsequently characterized by subjecting them as functional annotation algorithms with azurin as control. We have qualitatively predicted 14 putative bacteriocins that possessed functional properties very similar to those of azurin. In this work, we perform a number of quantitative and structure-based analyses including hydrophobic percentage calculation, structural modeling, and molecular docking study of bacteriocins of interest against protein p53, a cancer target. Finally, we have identified 8 putative bacteriocins that bind p53 in a same manner as p28-azurin and azurin, in which 3 peptides (p1seq16, p2seq20, and p3seq24) shared with our previous study and 5 novel ones (p1seq09, p2seq05, p2seq08, p3seq02, and p3seq17) discovered in the first time. These bacteriocins are suggested for further in vitro tests in different neoplastic line cells. Chuong Nguyen and Van Duy Nguyen Copyright © 2016 Chuong Nguyen and Van Duy Nguyen. All rights reserved. Predicting Subcellular Localization of Apoptosis Proteins Combining GO Features of Homologous Proteins and Distance Weighted KNN Classifier Sun, 24 Apr 2016 07:09:15 +0000 Apoptosis proteins play a key role in maintaining the stability of organism; the functions of apoptosis proteins are related to their subcellular locations which are used to understand the mechanism of programmed cell death. In this paper, we utilize GO annotation information of apoptosis proteins and their homologous proteins retrieved from GOA database to formulate feature vectors and then combine the distance weighted KNN classification algorithm with them to solve the data imbalance problem existing in CL317 data set to predict subcellular locations of apoptosis proteins. It is found that the number of homologous proteins can affect the overall prediction accuracy. Under the optimal number of homologous proteins, the overall prediction accuracy of our method on CL317 data set reaches 96.8% by Jackknife test. Compared with other existing methods, it shows that our proposed method is very effective and better than others for predicting subcellular localization of apoptosis proteins. Xiao Wang, Hui Li, Qiuwen Zhang, and Rong Wang Copyright © 2016 Xiao Wang et al. All rights reserved. Predicted 3D Model of the Rabies Virus Glycoprotein Trimer Sun, 24 Apr 2016 06:03:45 +0000 The RABVG ectodomain is a homotrimer, and trimers are often called spikes. They are responsible for the attachment of the virus through the interaction with nicotinic acetylcholine receptors, neural cell adhesion molecule (NCAM), and the p75 neurotrophin receptor (p75NTR). This makes them relevant in viral pathogenesis. The antigenic structure differs significantly between the trimers and monomers. Surfaces rich in hydrophobic amino acids are important for trimer stabilization in which the C-terminal of the ectodomain plays an important role; to understand these interactions between the G proteins, a mechanistic study of their functions was performed with a molecular model of G protein in its trimeric form. This verified its 3D conformation. The molecular modeling of G protein was performed by a I-TASSER server and was evaluated via a Rachamandran plot and ERRAT program obtained 84.64% and 89.9% of the residues in the favorable regions and overall quality factor, respectively. The molecular dynamics simulations were carried out on RABVG trimer at 310 K. From these theoretical studies, we retrieved the RMSD values from Cα atoms to assess stability. Preliminary model of G protein of rabies virus stable at 12 ns with molecular dynamics was obtained. Bastida-González Fernando, Celaya-Trejo Yersin, Correa-Basurto José, and Zárate-Segura Paola Copyright © 2016 Bastida-González Fernando et al. All rights reserved. A Comprehensive Curation Shows the Dynamic Evolutionary Patterns of Prokaryotic CRISPRs Mon, 18 Apr 2016 13:31:39 +0000 Motivation. Clustered regularly interspaced short palindromic repeat (CRISPR) is a genetic element with active regulation roles for foreign invasive genes in the prokaryotic genomes and has been engineered to work with the CRISPR-associated sequence (Cas) gene Cas9 as one of the modern genome editing technologies. Due to inconsistent definitions, the existing CRISPR detection programs seem to have missed some weak CRISPR signals. Results. This study manually curates all the currently annotated CRISPR elements in the prokaryotic genomes and proposes 95 updates to the annotations. A new definition is proposed to cover all the CRISPRs. The comprehensive comparison of CRISPR numbers on the taxonomic levels of both domains and genus shows high variations for closely related species even in the same genus. The detailed investigation of how CRISPRs are evolutionarily manipulated in the 8 completely sequenced species in the genus Thermoanaerobacter demonstrates that transposons act as a frequent tool for splitting long CRISPRs into shorter ones along a long evolutionary history. Guoqin Mai, Ruiquan Ge, Guoquan Sun, Qinghan Meng, and Fengfeng Zhou Copyright © 2016 Guoqin Mai et al. All rights reserved. The Occurrence of Genetic Alterations during the Progression of Breast Carcinoma Thu, 14 Apr 2016 09:28:52 +0000 The interrelationship among genetic variations between the developing process of carcinoma and the order of occurrence has not been completely understood. Interpreting the mechanisms of copy number variation (CNV) is absolutely necessary for understanding the etiology of genetic disorders. Oncogenetic tree is a special phylogenetic tree inferential pictorial representation of oncogenesis. In our present study, we constructed oncogenetic tree to imitate the occurrence of genetic and cytogenetic alterations in human breast cancer. The oncogenetic tree model was built on CNV of ErbB2, AKT2, KRAS, PIK3CA, PTEN, and CCND1 genes in 963 cases of tumors with sequencing and CNA data of human breast cancer from TCGA. Results from the oncogenetic tree model indicate that ErbB2 copy number variation is the frequent early event of human breast cancer. The oncogenetic tree model based on the phylogenetic tree is a type of mathematical model that may eventually provide a better way to understand the process of oncogenesis. Xiao-Chen Li, Chenglin Liu, Tao Huang, and Yang Zhong Copyright © 2016 Xiao-Chen Li et al. All rights reserved. Methylation Status of SP1 Sites within miR-23a-27a-24-2 Promoter Region Influences Laryngeal Cancer Cell Proliferation and Apoptosis Wed, 23 Mar 2016 12:17:47 +0000 DNA methylation plays critical roles in regulation of microRNA expression and function. miR-23a-27a-24-2 cluster has various functions and aberrant expression of the cluster is a common event in many cancers. However, whether DNA methylation influences the cluster expression and function is not reported. Here we found a CG-rich region spanning two SP1 sites in the cluster promoter region. The SP1 sites in the cluster were demethylated and methylated in Hep2 cells and HEK293 cells, respectively. Meanwhile, the cluster was significantly upregulated and downregulated in Hep2 cells and HEK293 cells, respectively. The SP1 sites were remethylated and the cluster was significantly downregulated in Hep2 cells into which methyl donor, S-adenosyl-L-methionine, was introduced. Moreover, S-adenosyl-L-methionine significantly increased Hep2 cell viability and repressed Hep2 cell early apoptosis. We also found that construct with two SP1 sites had highest luciferase activity and SP1 specifically bound the gene cluster promoter in vitro. We conclude that demethylated SP1 sites in miR-23a-27a-24-2 cluster upregulate the cluster expression, leading to proliferation promotion and early apoptosis inhibition in laryngeal cancer cells. Ye Wang, Zhao-Xiong Zhang, Sheng Chen, Guang-Bin Qiu, Zhen-Ming Xu, and Wei-Neng Fu Copyright © 2016 Ye Wang et al. All rights reserved. -Index for Differentiating Complex Dynamic Traits Tue, 15 Mar 2016 16:43:19 +0000 While it is a daunting challenge in current biology to understand how the underlying network of genes regulates complex dynamic traits, functional mapping, a tool for mapping quantitative trait loci (QTLs) and single nucleotide polymorphisms (SNPs), has been applied in a variety of cases to tackle this challenge. Though useful and powerful, functional mapping performs well only when one or more model parameters are clearly responsible for the developmental trajectory, typically being a logistic curve. Moreover, it does not work when the curves are more complex than that, especially when they are not monotonic. To overcome this inadaptability, we therefore propose a mathematical-biological concept and measurement, -index (earliness-index), which cumulatively measures the earliness degree to which a variable (or a dynamic trait) increases or decreases its value. Theoretical proofs and simulation studies show that -index is more general than functional mapping and can be applied to any complex dynamic traits, including those with logistic curves and those with nonmonotonic curves. Meanwhile, -index vector is proposed as well to capture more subtle differences of developmental patterns. Jiandong Qi, Jianfeng Sun, and Jianxin Wang Copyright © 2016 Jiandong Qi et al. All rights reserved. Using Small RNA Deep Sequencing Data to Detect Human Viruses Tue, 15 Mar 2016 13:33:25 +0000 Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans. Fang Wang, Yu Sun, Jishou Ruan, Rui Chen, Xin Chen, Chengjie Chen, Jan F. Kreuze, ZhangJun Fei, Xiao Zhu, and Shan Gao Copyright © 2016 Fang Wang et al. All rights reserved. RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest Tue, 15 Mar 2016 12:13:52 +0000 Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite. Hamid D. Ismail, Ahoi Jones, Jung H. Kim, Robert H. Newman, and Dukka B. KC Copyright © 2016 Hamid D. Ismail et al. All rights reserved. SNP Mining in Functional Genes from Nonmodel Species by Next-Generation Sequencing: A Case of Flowering, Pre-Harvest Sprouting, and Dehydration Resistant Genes in Wheat Mon, 14 Mar 2016 10:48:25 +0000 As plenty of nonmodel plants are without genomic sequences, the combination of molecular technologies and the next generation sequencing (NGS) platform has led to a new approach to study the genetic variations of these plants. Software GATK, SOAPsnp, samtools, and others are often used to deal with the NGS data. In this study, BLAST was applied to call SNPs from 16 mixed functional gene’s sequence data of polyploidy wheat. In total 1.2 million reads were obtained with the average of 7500 reads per genes. To get accurate information, 390,992 pair reads were successfully assembled before aligning to those functional genes. Standalone BLAST tools were used to map assembled sequence to functional genes, respectively. Polynomial fitting was applied to find the suitable minor allele frequency (MAF) threshold at 6% for assembled reads of each functional gene. SNPs accuracy form assembled reads, pretrimmed reads, and original reads were compared, which declared that SNPs mined from the assembled reads were more reliable than others. It was also demonstrated that mixed samples’ NGS sequences and then analysis by BLAST were an effective, low-cost, and accurate way to mine SNPs for nonmodel species. Assembled reads and polynomial fitting threshold were recommended for more accurate SNPs target. Zhong-Xu Chen, Mei Deng, and Ji-Rui Wang Copyright © 2016 Zhong-Xu Chen et al. All rights reserved. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues Thu, 10 Mar 2016 08:27:20 +0000 Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields. Song Guo, Chunhua Liu, Peng Zhou, and Yanling Li Copyright © 2016 Song Guo et al. All rights reserved. Treating Diabetes Mellitus: Pharmacophore Based Designing of Potential Drugs from Gymnema sylvestre against Insulin Receptor Protein Sun, 28 Feb 2016 14:16:49 +0000 Diabetes mellitus (DM) is one of the most prevalent metabolic disorders which can affect the quality of life severely. Injectable insulin is currently being used to treat DM which is mainly associated with patient inconvenience. Small molecules that can act as insulin receptor (IR) agonist would be better alternatives to insulin injection. Herein, ten bioactive small compounds derived from Gymnema sylvestre (G. sylvestre) were chosen to determine their IR binding affinity and ADMET properties using a combined approach of molecular docking study and computational pharmacokinetic elucidation. Designing structural analogues were also performed for the compounds associated with toxicity and less IR affinity. Among the ten parent compounds, six were found to have significant pharmacokinetic properties with considerable binding affinity towards IR while four compounds were associated with toxicity and less IR affinity. Among the forty structural analogues, four compounds demonstrated considerably increased binding affinity towards IR and less toxicity compared with parent compounds. Finally, molecular interaction analysis revealed that six parent compounds and four analogues interact with the active site amino acids of IR. So this study would be a way to identify new therapeutics and alternatives to insulin for diabetic patients. Mohammad Uzzal Hossain, Md. Arif Khan, S. M. Rakib-Uz-Zaman, Mohammad Tuhin Ali, Md. Saidul Islam, Chaman Ara Keya, and Md. Salimullah Copyright © 2016 Mohammad Uzzal Hossain et al. All rights reserved. Identification of Deleterious Mutations in Myostatin Gene of Rohu Carp (Labeo rohita) Using Modeling and Molecular Dynamic Simulation Approaches Thu, 25 Feb 2016 15:28:07 +0000 The myostatin (MSTN) is a known negative growth regulator of skeletal muscle. The mutated myostatin showed a double-muscular phenotype having a positive significance for the farmed animals. Consequently, adequate information is not available in the teleosts, including farmed rohu carp, Labeo rohita. In the absence of experimental evidence, computational algorithms were utilized in predicting the impact of point mutation of rohu myostatin, especially its structural and functional relationships. The four mutations were generated at different positions (p.D76A, p.Q204P, p.C312Y, and p.D313A) of MSTN protein of rohu. The impacts of each mutant were analyzed using SIFT, I-Mutant 2.0, PANTHER, and PROVEAN, wherein two substitutions (p.D76A and p.Q204P) were predicted as deleterious. The comparative structural analysis of each mutant protein with the native was explored using 3D modeling as well as molecular-dynamic simulation techniques. The simulation showed altered dynamic behaviors concerning RMSD and RMSF, for either p.D76A or p.Q204P substitution, when compared with the native counterpart. Interestingly, incorporated two mutations imposed a significant negative impact on protein structure and stability. The present study provided the first-hand information in identifying possible amino acids, where mutations could be incorporated into MSTN gene of rohu carp including other carps for undertaking further in vivo studies. Kiran Dashrath Rasal, Vemulawada Chakrapani, Swagat Kumar Patra, Shibani D. Mohapatra, Swapnarani Nayak, Sasmita Jena, Jitendra Kumar Sundaray, Pallipuram Jayasankar, and Hirak Kumar Barman Copyright © 2016 Kiran Dashrath Rasal et al. All rights reserved. VGSC: A Web-Based Vector Graph Toolkit of Genome Synteny and Collinearity Wed, 24 Feb 2016 10:02:32 +0000 Background. In order to understand the colocalization of genetic loci amongst species, synteny and collinearity analysis is a frequent task in comparative genomics research. However many analysis software packages are not effective in visualizing results. Problems include lack of graphic visualization, simple representation, or inextensible format of outputs. Moreover, higher throughput sequencing technology requires higher resolution image output. Implementation. To fill this gap, this paper publishes VGSC, the Vector Graph toolkit of genome Synteny and Collinearity, and its online service, to visualize the synteny and collinearity in the common graphical format, including both raster (JPEG, Bitmap, and PNG) and vector graphic (SVG, EPS, and PDF). Result. Users can upload sequence alignments from blast and collinearity relationship from the synteny analysis tools. The website can generate the vector or raster graphical results automatically. We also provide a java-based bytecode binary to enable the command-line execution. Yiqing Xu, Changwei Bi, Guoxin Wu, Suyun Wei, Xiaogang Dai, Tongming Yin, and Ning Ye Copyright © 2016 Yiqing Xu et al. All rights reserved. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification Sun, 14 Feb 2016 14:02:26 +0000 Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data. Yin Wang, Rudong Li, Yuhua Zhou, Zongxin Ling, Xiaokui Guo, Lu Xie, and Lei Liu Copyright © 2016 Yin Wang et al. All rights reserved. Advancements in RNASeqGUI towards a Reproducible Analysis of RNA-Seq Experiments Wed, 10 Feb 2016 13:50:15 +0000 We present the advancements and novelties recently introduced in RNASeqGUI, a graphical user interface that helps biologists to handle and analyse large data collected in RNA-Seq experiments. This work focuses on the concept of reproducible research and shows how it has been incorporated in RNASeqGUI to provide reproducible (computational) results. The novel version of RNASeqGUI combines graphical interfaces with tools for reproducible research, such as literate statistical programming, human readable report, parallel executions, caching, and interactive and web-explorable tables of results. These features allow the user to analyse big datasets in a fast, efficient, and reproducible way. Moreover, this paper represents a proof of concept, showing a simple way to develop computational tools for Life Science in the spirit of reproducible research. Francesco Russo, Dario Righelli, and Claudia Angelini Copyright © 2016 Francesco Russo et al. All rights reserved. A Prediction Model for Membrane Proteins Using Moments Based Features Sun, 07 Feb 2016 15:42:21 +0000 The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. Ahmad Hassan Butt, Sher Afzal Khan, Hamza Jamil, Nouman Rasool, and Yaser Daanial Khan Copyright © 2016 Ahmad Hassan Butt et al. All rights reserved. PIPINO: A Software Package to Facilitate the Identification of Protein-Protein Interactions from Affinity Purification Mass Spectrometry Data Sun, 07 Feb 2016 14:17:44 +0000 The functionality of most proteins is regulated by protein-protein interactions. Hence, the comprehensive characterization of the interactome is the next milestone on the path to understand the biochemistry of the cell. A powerful method to detect protein-protein interactions is a combination of coimmunoprecipitation or affinity purification with quantitative mass spectrometry. Nevertheless, both methods tend to precipitate a high number of background proteins due to nonspecific interactions. To address this challenge the software Protein-Protein-Interaction-Optimizer (PIPINO) was developed to perform an automated data analysis, to facilitate the selection of bona fide binding partners, and to compare the dynamic of interaction networks. In this study we investigated the STAT1 interaction network and its activation dependent dynamics. Stable isotope labeling by amino acids in cell culture (SILAC) was applied to analyze the STAT1 interactome after streptavidin pull-down of biotagged STAT1 from human embryonic kidney 293T cells with and without activation. Starting from more than 2,000 captured proteins 30 potential STAT1 interaction partners were extracted. Interestingly, more than 50% of these were already reported or predicted to bind STAT1. Furthermore, 16 proteins were found to affect the binding behavior depending on STAT1 phosphorylation such as STAT3 or the importin subunits alpha 1 and alpha 6. Stefan Kalkhof, Stefan Schildbach, Conny Blumert, Friedemann Horn, Martin von Bergen, and Dirk Labudde Copyright © 2016 Stefan Kalkhof et al. All rights reserved. Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm Wed, 03 Feb 2016 06:40:35 +0000 The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request. ShaoPeng Wang, Yu-Hang Zhang, Jing Lu, Weiren Cui, Jerry Hu, and Yu-Dong Cai Copyright © 2016 ShaoPeng Wang et al. All rights reserved. ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition Thu, 28 Jan 2016 06:46:23 +0000 Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities. Abbas Akkasi, Ekrem Varoğlu, and Nazife Dimililer Copyright © 2016 Abbas Akkasi et al. All rights reserved. Inhibition of DNA Topoisomerase Type IIα (TOP2A) by Mitoxantrone and Its Halogenated Derivatives: A Combined Density Functional and Molecular Docking Study Wed, 27 Jan 2016 09:17:56 +0000 In this study, mitoxantrone and its halogenated derivatives have been designed by density functional theory (DFT) to explore their structural and thermodynamical properties. The performance of these drugs was also evaluated to inhibit DNA topoisomerase type IIα (TOP2A) by molecular docking calculation. Noncovalent interactions play significant role in improving the performance of halogenated drugs. The combined quantum and molecular mechanics calculations revealed that CF3 containing drug shows better preference in inhibiting the TOP2A compared to other modified drugs. Md. Abu Saleh, Md. Solayman, Mohammad Mazharol Hoque, Mohammad A. K. Khan, Mohammed G. Sarwar, and Mohammad A. Halim Copyright © 2016 Md. Abu Saleh et al. All rights reserved. Segmenting Brain Tissues from Chinese Visible Human Dataset by Deep-Learned Features with Stacked Autoencoder Tue, 26 Jan 2016 13:58:34 +0000 Cryosection brain images in Chinese Visible Human (CVH) dataset contain rich anatomical structure information of tissues because of its high resolution (e.g., 0.167 mm per pixel). Fast and accurate segmentation of these images into white matter, gray matter, and cerebrospinal fluid plays a critical role in analyzing and measuring the anatomical structures of human brain. However, most existing automated segmentation methods are designed for computed tomography or magnetic resonance imaging data, and they may not be applicable for cryosection images due to the imaging difference. In this paper, we propose a supervised learning-based CVH brain tissues segmentation method that uses stacked autoencoder (SAE) to automatically learn the deep feature representations. Specifically, our model includes two successive parts where two three-layer SAEs take image patches as input to learn the complex anatomical feature representation, and then these features are sent to Softmax classifier for inferring the labels. Experimental results validated the effectiveness of our method and showed that it outperformed four other classical brain tissue detection strategies. Furthermore, we reconstructed three-dimensional surfaces of these tissues, which show their potential in exploring the high-resolution anatomical structures of human brain. Guangjun Zhao, Xuchu Wang, Yanmin Niu, Liwen Tan, and Shao-Xiang Zhang Copyright © 2016 Guangjun Zhao et al. All rights reserved. Automated Cell Selection Using Support Vector Machine for Application to Spectral Nanocytology Tue, 19 Jan 2016 16:21:10 +0000 Partial wave spectroscopy (PWS) enables quantification of the statistical properties of cell structures at the nanoscale, which has been used to identify patients harboring premalignant tumors by interrogating easily accessible sites distant from location of the lesion. Due to its high sensitivity, cells that are well preserved need to be selected from the smear images for further analysis. To date, such cell selection has been done manually. This is time-consuming, is labor-intensive, is vulnerable to bias, and has considerable inter- and intraoperator variability. In this study, we developed a classification scheme to identify and remove the corrupted cells or debris that are of no diagnostic value from raw smear images. The slide of smear sample is digitized by acquiring and stitching low-magnification transmission. Objects are then extracted from these images through segmentation algorithms. A training-set is created by manually classifying objects as suitable or unsuitable. A feature-set is created by quantifying a large number of features for each object. The training-set and feature-set are used to train a selection algorithm using Support Vector Machine (SVM) classifiers. We show that the selection algorithm achieves an error rate of 93% with a sensitivity of 95%. Qin Miao, Justin Derbas, Aya Eid, Hariharan Subramanian, and Vadim Backman Copyright © 2016 Qin Miao et al. All rights reserved. Identification of Novel RD1 Antigens and Their Combinations for Diagnosis of Sputum Smear−/Culture+ TB Patients Mon, 18 Jan 2016 13:39:48 +0000 Rapid and accurate diagnosis of pulmonary tuberculosis (PTB) is an unresolved problem worldwide, especially for sputum smear− (S−) cases. In this study, five antigen genes including Rv3871, Rv3874, Rv3875, Rv3876, and Rv3879 were cloned from Mycobacterium tuberculosis (Mtb) RD1 and overexpressed to generate antigen fragments. These antigens and their combinations were investigated for PTB serodiagnosis. 298 serum samples were collected from active PTB patients, including 117 sputum smear+ (S+) and sputum culture+ (C+) cases, 101 S−/C+ cases, and 80 S−/C− cases. The serum IgG levels of the five antigens were measured by ELISA. Based on IgG levels, the sensitivity/specificity of Rv3871, Rv3874, Rv3875, Rv3876, and Rv3879 for PTB detection was 81.21%/74.74%, 63.09%/94.78%, 32.21%/87.37%, 62.42%/85.26%, and 83.56%/83.16%, respectively. Furthermore, the optimal result for PTB diagnosis was achieved by combining antigens Rv3871, Rv3876, and Rv3879. In addition, the IgG levels of Rv3871, Rv3876, and Rv3879 were found to be higher in S−/C+ PTB patients than in other PTB populations. More importantly, combination of the three antigens demonstrated superior diagnostic performance for both S−/C+ and S−/C− PTB. In conclusion, the combination of Rv3871, Rv3876, and Rv3879 induced higher IgG response in sputum S−/C+ PTB patients and represents a promising biomarker combination for diagnosing of PTB. Zhiqiang Liu, Shuang Qie, Lili Li, Bingshui Xiu, Xiqin Yang, Zhenhua Dai, Xuhui Zhang, Cuimi Duan, Haiping Que, Ping Zhao, Heather Johnson, Heqiu Zhang, and Xiaoyan Feng Copyright © 2016 Zhiqiang Liu et al. All rights reserved. The Subcellular Localization and Functional Analysis of Fibrillarin2, a Nucleolar Protein in Nicotiana benthamiana Sun, 17 Jan 2016 10:20:02 +0000 Nucleolar proteins play important roles in plant cytology, growth, and development. Fibrillarin2 is a nucleolar protein of Nicotiana benthamiana (N. benthamiana). Its cDNA was amplified by RT-PCR and inserted into expression vector pEarley101 labeled with yellow fluorescent protein (YFP). The fusion protein was localized in the nucleolus and Cajal body of leaf epidermal cells of N. benthamiana. The N. benthamiana fibrillarin2 (NbFib2) protein has three functional domains (i.e., glycine and arginine rich domain, RNA-binding domain, and α-helical domain) and a nuclear localization signal (NLS) in C-terminal. The protein 3D structure analysis predicted that NbFib2 is an α/β protein. In addition, the virus induced gene silencing (VIGS) approach was used to determine the function of NbFib2. Our results showed that symptoms including growth retardation, organ deformation, chlorosis, and necrosis appeared in NbFib2-silenced N. benthamiana. Luping Zheng, Jinai Yao, Fangluan Gao, Lin Chen, Chao Zhang, Lingli Lian, Liyan Xie, Zujian Wu, and Lianhui Xie Copyright © 2016 Luping Zheng et al. All rights reserved. Modification of the Sweetness and Stability of Sweet-Tasting Protein Monellin by Gene Mutation and Protein Engineering Sun, 10 Jan 2016 10:31:15 +0000 Natural sweet protein monellin has a high sweetness and low calorie, suggesting its potential in food applications. However, due to its low heat and acid resistance, the application of monellin is limited. In this study, we show that the thermostability of monellin can be improved with no sweetness decrease by means of sequence, structure analysis, and site-directed mutagenesis. We analyzed residues located in the α-helix as well as an ionizable residue C41. Of the mutants investigated, the effects of E23A and C41A mutants were most remarkable. The former displayed significantly improved thermal stability, while its sweetness was not changed. The mutated protein was stable after 30 min incubation at 85°C. The latter showed increased sweetness and slight improvement of thermostability. Furthermore, we found that most mutants enhancing the thermostability of the protein were distributed at the two ends of α-helix. Molecular biophysics analysis revealed that the state of buried ionizable residues may account for the modulated properties of mutated proteins. Our results prove that the properties of sweet protein monellin can be modified by means of bioinformatics analysis, gene manipulation, and protein modification, highlighting the possibility of designing novel effective sweet proteins based on structure-function relationships. Qiulei Liu, Lei Li, Liu Yang, Tianming Liu, Chenggu Cai, and Bo Liu Copyright © 2016 Qiulei Liu et al. All rights reserved. Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model Tue, 29 Dec 2015 14:36:27 +0000 Recent study shows that long noncoding RNAs (lncRNAs) are participating in diverse biological processes and complex diseases. However, at present the functions of lncRNAs are still rarely known. In this study, we propose a network-based computational method, which is called lncRNA-protein interaction prediction based on Heterogeneous Network Model (LPIHN), to predict the potential lncRNA-protein interactions. First, we construct a heterogeneous network by integrating the lncRNA-lncRNA similarity network, lncRNA-protein interaction network, and protein-protein interaction (PPI) network. Then, a random walk with restart is implemented on the heterogeneous network to infer novel lncRNA-protein interactions. The leave-one-out cross validation test shows that our approach can achieve an AUC value of 96.0%. Some lncRNA-protein interactions predicted by our method have been confirmed in recent research or database, indicating the efficiency of LPIHN to predict novel lncRNA-protein interactions. Ao Li, Mengqu Ge, Yao Zhang, Chen Peng, and Minghui Wang Copyright © 2015 Ao Li et al. All rights reserved. Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords Thu, 10 Dec 2015 06:31:19 +0000 For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as “bind” or “interact” plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction. Shun Koyabu, Thi Thanh Thuy Phan, and Takenao Ohkawa Copyright © 2015 Shun Koyabu et al. All rights reserved. Cofunctional Subpathways Were Regulated by Transcription Factor with Common Motif, Common Family, or Common Tissue Tue, 24 Nov 2015 14:10:30 +0000 Dissecting the characteristics of the transcription factor (TF) regulatory subpathway is helpful for understanding the TF underlying regulatory function in complex biological systems. To gain insight into the influence of TFs on their regulatory subpathways, we constructed a global TF-subpathways network (TSN) to analyze systematically the regulatory effect of common-motif, common-family, or common-tissue TFs on subpathways. We performed cluster analysis to show that the common-motif, common-family, or common-tissue TFs that regulated the same pathway classes tended to cluster together and contribute to the same biological function that led to disease initiation and progression. We analyzed the Jaccard coefficient to show that the functional consistency of subpathways regulated by the TF pairs with common motif, common family, or common tissue was significantly greater than the random TF pairs at the subpathway level, pathway level, and pathway class level. For example, HNF4A (hepatocyte nuclear factor 4, alpha) and NR1I3 (nuclear receptor subfamily 1, group I, member 3) were a pair of TFs with common motif, common family, and common tissue. They were involved in drug metabolism pathways and were liver-specific factors required for physiological transcription. In short, we inferred that the cofunctional subpathways were regulated by common-motif, common-family, or common-tissue TFs. Fei Su, Desi Shang, Yanjun Xu, Li Feng, Haixiu Yang, Baoquan Liu, Shengyang Su, Lina Chen, and Xia Li Copyright © 2015 Fei Su et al. All rights reserved. MatPred: Computational Identification of Mature MicroRNAs within Novel Pre-MicroRNAs Mon, 23 Nov 2015 09:23:36 +0000 Background. MicroRNAs (miRNAs) are short noncoding RNAs integral for regulating gene expression at the posttranscriptional level. However, experimental methods often fall short in finding miRNAs expressed at low levels or in specific tissues. While several computational methods have been developed for predicting the localization of mature miRNAs within the precursor transcript, the prediction accuracy requires significant improvement. Methodology/Principal Findings. Here, we present MatPred, which predicts mature miRNA candidates within novel pre-miRNA transcripts. In addition to the relative locus of the mature miRNA within the pre-miRNA hairpin loop and minimum free energy, we innovatively integrated features that describe the nucleotide-specific RNA secondary structure characteristics. In total, 94 features were extracted from the mature miRNA loci and flanking regions. The model was trained based on a radial basis function kernel/support vector machine (RBF/SVM). Our method can predict precise locations of mature miRNAs, as affirmed by experimentally verified human pre-miRNAs or pre-miRNAs candidates, thus achieving a significant advantage over existing methods. Conclusions. MatPred is a highly effective method for identifying mature miRNAs within novel pre-miRNA transcripts. Our model significantly outperformed three other widely used existing methods. Such processing prediction methods may provide important insight into miRNA biogenesis. Jin Li, Ying Wang, Lei Wang, Weixing Feng, Kuan Luan, Xuefeng Dai, Chengzhen Xu, Xianglian Meng, Qiushi Zhang, and Hong Liang Copyright © 2015 Jin Li et al. All rights reserved. Corrigendum to “Information-Theoretical Quantifier of Brain Rhythm Based on Data-Driven Multiscale Representation” Thu, 12 Nov 2015 09:43:55 +0000 Young-Seok Choi and Xiaofeng Jia Copyright © 2015 Young-Seok Choi and Xiaofeng Jia. All rights reserved. Improved Pre-miRNA Classification by Reducing the Effect of Class Imbalance Tue, 10 Nov 2015 13:09:26 +0000 MicroRNAs (miRNAs) play important roles in the diverse biological processes of animals and plants. Although the prediction methods based on machine learning can identify nonhomologous and species-specific miRNAs, they suffered from severe class imbalance on real and pseudo pre-miRNAs. We propose a pre-miRNA classification method based on cost-sensitive ensemble learning and refer to it as MiRNAClassify. Through a series of iterations, the information of all the positive and negative samples is completely exploited. In each iteration, a new classification instance is trained by the equal number of positive and negative samples. In this way, the negative effect of class imbalance is efficiently relieved. The new instance primarily focuses on those samples that are easy to be misclassified. In addition, the positive samples are assigned higher cost weight than the negative samples. MiRNAClassify is compared with several state-of-the-art methods and some well-known classification models by testing the datasets about human, animal, and plant. The result of cross validation indicates that MiRNAClassify significantly outperforms other methods and models. In addition, the newly added pre-miRNAs are used to further evaluate the ability of these methods to discover novel pre-miRNAs. MiRNAClassify still achieves consistently superior performance and can discover more pre-miRNAs. Yingli Zhong, Ping Xuan, Ke Han, Weiping Zhang, and Jianzhong Li Copyright © 2015 Yingli Zhong et al. All rights reserved. Comparative Genome and Network Centrality Analysis to Identify Drug Targets of Mycobacterium tuberculosis H37Rv Thu, 05 Nov 2015 13:16:24 +0000 Potential drug targets of Mycobacterium tuberculosis H37Rv were identified through systematically integrated comparative genome and network centrality analysis. The comparative analysis of the complete genome of Mycobacterium tuberculosis H37Rv against Database of Essential Genes (DEG) yields a list of proteins which are essential for the growth and survival of the pathogen. Those proteins which are nonhomologous with human were selected. The resulting proteins were then prioritized by using the four network centrality measures: degree, closeness, betweenness, and eigenvector. Proteins whose centrality value is close to the centre of gravity of the interactome network were proposed as a final list of potential drug targets for the pathogen. The use of an integrated approach is believed to increase the success of the drug target identification process. For the purpose of validation, selective comparisons have been made among the proposed targets and previously identified drug targets by various other methods. About half of these proteins have been already reported as potential drug targets. We believe that the identified proteins will be an important input to experimental study which in the way could save considerable amount of time and cost of drug target discovery. Tilahun Melak and Sunita Gakkhar Copyright © 2015 Tilahun Melak and Sunita Gakkhar. All rights reserved. The Roles of miR-26, miR-29, and miR-203 in the Silencing of the Epigenetic Machinery during Melanocyte Transformation Wed, 04 Nov 2015 08:28:53 +0000 The epigenetic marks located throughout the genome exhibit great variation between normal and transformed cancer cells. While normal cells contain hypomethylated CpG islands near gene promoters and hypermethylated repetitive DNA, the opposite pattern is observed in cancer cells. Recently, it has been reported that alteration in the microenvironment of melanocyte cells, such as substrate adhesion blockade, results in the selection of anoikis-resistant cells, which have tumorigenic characteristics. Melanoma cells obtained through this model show an altered epigenetic pattern, which represents one of the first events during the melanocytes malignant transformation. Because microRNAs are involved in controlling components of the epigenetic machinery, the aim of this work was to evaluate the potential association between the expression of miR-203, miR-26, and miR-29 family members and the genes Dnmt3a, Dnmt3b, Mecp2, and Ezh2 during cells transformation. Our results show that microRNAs and their validated or predicted targets are inversely expressed, indicating that these molecules are involved in epigenetic reprogramming. We also show that miR-203 downregulates Dnmt3b in mouse melanocyte cells. In addition, treatment with 5-aza-CdR promotes the expression of miR-26 and miR-29 in a nonmetastatic melanoma cell line. Considering the occurrence of CpG islands near the miR-26 and miR-29 promoters, these data suggest that they might be epigenetically regulated in cancer. Cláudia Regina Gasque Schoof, Alberto Izzotti, Miriam Galvonas Jasiulionis, and Luciana dos Reis Vasques Copyright © 2015 Cláudia Regina Gasque Schoof et al. All rights reserved. Mining for Candidate Genes Related to Pancreatic Cancer Using Protein-Protein Interactions and a Shortest Path Approach Tue, 03 Nov 2015 09:37:59 +0000 Pancreatic cancer (PC) is a highly malignant tumor derived from pancreas tissue and is one of the leading causes of death from cancer. Its molecular mechanism has been partially revealed by validating its oncogenes and tumor suppressor genes; however, the available data remain insufficient for medical workers to design effective treatments. Large-scale identification of PC-related genes can promote studies on PC. In this study, we propose a computational method for mining new candidate PC-related genes. A large network was constructed using protein-protein interaction information, and a shortest path approach was applied to mine new candidate genes based on validated PC-related genes. In addition, a permutation test was adopted to further select key candidate genes. Finally, for all discovered candidate genes, the likelihood that the genes are novel PC-related genes is discussed based on their currently known functions. Fei Yuan, Yu-Hang Zhang, Sibao Wan, ShaoPeng Wang, and Xiang-Yin Kong Copyright © 2015 Fei Yuan et al. All rights reserved. Big Data and Network Biology 2015 Sun, 01 Nov 2015 07:06:50 +0000 Shigehiko Kanaya, Md. Altaf-Ul-Amin, Samuel K. Kiboi, and Farit Mochamad Afendi Copyright © 2015 Shigehiko Kanaya et al. All rights reserved. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species Thu, 29 Oct 2015 13:34:56 +0000 Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification. Deborah Galpert, Sara del Río, Francisco Herrera, Evys Ancede-Gallardo, Agostinho Antunes, and Guillermin Agüero-Chapin Copyright © 2015 Deborah Galpert et al. All rights reserved. Frontiers in Integrative Genomics and Translational Bioinformatics Wed, 28 Oct 2015 13:36:26 +0000 Zhongming Zhao, Victor X. Jin, Yufei Huang, Chittibabu Guda, and Jianhua Ruan Copyright © 2015 Zhongming Zhao et al. All rights reserved. Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence Wed, 28 Oct 2015 07:26:10 +0000 Increasing demand for the knowledge about protein-protein interactions (PPIs) is promoting the development of methods for predicting protein interaction network. Although high-throughput technologies have generated considerable PPIs data for various organisms, it has inevitable drawbacks such as high cost, time consumption, and inherently high false positive rate. For this reason, computational methods are drawing more and more attention for predicting PPIs. In this study, we report a computational method for predicting PPIs using the information of protein sequences. The main improvements come from adopting a novel protein sequence representation by using discrete cosine transform (DCT) on substitution matrix representation (SMR) and from using weighted sparse representation based classifier (WSRC). When performing on the PPIs dataset of Yeast, Human, and H. pylori, we got excellent results with average accuracies as high as 96.28%, 96.30%, and 86.74%, respectively, significantly better than previous methods. Promising results obtained have proven that the proposed method is feasible, robust, and powerful. To further evaluate the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier. Extensive experiments were also performed in which we used Yeast PPIs samples as training set to predict PPIs of other five species datasets. Yu-An Huang, Zhu-Hong You, Xin Gao, Leon Wong, and Lirong Wang Copyright © 2015 Yu-An Huang et al. All rights reserved. Systematic Analysis and Prediction of In Situ Cross Talk of O-GlcNAcylation and Phosphorylation Tue, 27 Oct 2015 06:51:33 +0000 Reversible posttranslational modification (PTM) plays a very important role in biological process by changing properties of proteins. As many proteins are multiply modified by PTMs, cross talk of PTMs is becoming an intriguing topic and draws much attention. Currently, lots of evidences suggest that the PTMs work together to accomplish a specific biological function. However, both the general principles and underlying mechanism of PTM crosstalk are elusive. In this study, by using large-scale datasets we performed evolutionary conservation analysis, gene ontology enrichment, motif extraction of proteins with cross talk of O-GlcNAcylation and phosphorylation cooccurring on the same residue. We found that proteins with in situ O-GlcNAc/Phos cross talk were significantly enriched in some specific gene ontology terms and no obvious evolutionary pressure was observed. Moreover, 3 functional motifs associated with O-GlcNAc/Phos sites were extracted. We further used sequence features and GO features to predict O-GlcNAc/Phos cross talk sites based on phosphorylated sites and O-GlcNAcylated sites separately by the use of SVM model. The AUC of classifier based on phosphorylated sites is 0.896 and the other classifier based on GlcNAcylated sites is 0.843. Both classifiers achieved a relatively better performance compared with other existing methods. Heming Yao, Ao Li, and Minghui Wang Copyright © 2015 Heming Yao et al. All rights reserved. JPPRED: Prediction of Types of J-Proteins from Imbalanced Data Using an Ensemble Learning Method Mon, 26 Oct 2015 06:16:47 +0000 Different types of J-proteins perform distinct functions in chaperone processes and diseases development. Accurate identification of types of J-proteins will provide significant clues to reveal the mechanism of J-proteins and contribute to developing drugs for diseases. In this study, an ensemble predictor called JPPRED for J-protein prediction is proposed with hybrid features, including split amino acid composition (SAAC), pseudo amino acid composition (PseAAC), and position specific scoring matrix (PSSM). To deal with the imbalanced benchmark dataset, the synthetic minority oversampling technique (SMOTE) and undersampling technique are applied. The average sensitivity of JPPRED based on above-mentioned individual feature spaces lies in the range of 0.744–0.851, indicating the discriminative power of these features. In addition, JPPRED yields the highest average sensitivity of 0.875 using the hybrid feature spaces of SAAC, PseAAC, and PSSM. Compared to individual base classifiers, JPPRED obtains more balanced and better performance for each type of J-proteins. To evaluate the prediction performance objectively, JPPRED is compared with previous study. Encouragingly, JPPRED obtains balanced performance for each type of J-proteins, which is significantly superior to that of the existing method. It is anticipated that JPPRED can be a potential candidate for J-protein prediction. Lina Zhang, Chengjin Zhang, Rui Gao, and Runtao Yang Copyright © 2015 Lina Zhang et al. All rights reserved. Identification of Gene Expression Pattern Related to Breast Cancer Survival Using Integrated TCGA Datasets and Genomic Tools Tue, 20 Oct 2015 14:15:40 +0000 Several large-scale human cancer genomics projects such as TCGA offered huge genomic and clinical data for researchers to obtain meaningful genomics alterations which intervene in the development and metastasis of the tumor. A web-based TCGA data analysis platform called TCGA4U was developed in this study. TCGA4U provides a visualization solution for this study to illustrate the relationship of these genomics alternations with clinical data. A whole genome screening of the survival related gene expression patterns in breast cancer was studied. The gene list that impacts the breast cancer patient survival was divided into two patterns. Gene list of each of these patterns was separately analyzed on DAVID. The result showed that mitochondrial ribosomes play a more crucial role in the cancer development. We also reported that breast cancer patients with low HSPA2 expression level had shorter overall survival time. This is widely different to findings of HSPA2 expression pattern in other cancer types. TCGA4U provided a new perspective for the TCGA datasets. We believe it can inspire more biomedical researchers to study and explain the genomic alterations in cancer development and discover more targeted therapies to help more cancer patients. Zhenzhen Huang, Huilong Duan, and Haomin Li Copyright © 2015 Zhenzhen Huang et al. All rights reserved. Proteomic Study to Survey the CIGB-552 Antitumor Effect Tue, 20 Oct 2015 11:43:43 +0000 CIGB-552 is a cell-penetrating peptide that exerts in vitro and in vivo antitumor effect on cancer cells. In the present work, the mechanism involved in such anticancer activity was studied using chemical proteomics and expression-based proteomics in culture cancer cell lines. CIGB-552 interacts with at least 55 proteins, as determined by chemical proteomics. A temporal differential proteomics based on iTRAQ quantification method was performed to identify CIGB-552 modulated proteins. The proteomic profile includes 72 differentially expressed proteins in response to CIGB-552 treatment. Proteins related to cell proliferation and apoptosis were identified by both approaches. In line with previous findings, proteomic data revealed that CIGB-552 triggers the inhibition of NF-κB signaling pathway. Furthermore, proteins related to cell invasion were differentially modulated by CIGB-552 treatment suggesting new potentialities of CIGB-552 as anticancer agent. Overall, the current study contributes to a better understanding of the antitumor action mechanism of CIGB-552. Arielis Rodríguez-Ulloa, Jeovanis Gil, Yassel Ramos, Lilian Hernández-Álvarez, Lisandra Flores, Brizaida Oliva, Dayana García, Aniel Sánchez-Puente, Alexis Musacchio-Lasa, Jorge Fernández-de-Cossio, Gabriel Padrón, Luis J. González López, Vladimir Besada, and Maribel Guerra-Vallespí Copyright © 2015 Arielis Rodríguez-Ulloa et al. All rights reserved. Personal Verification/Identification via Analysis of the Peripheral ECG Leads: Influence of the Personal Health Status on the Accuracy Mon, 19 Oct 2015 14:11:32 +0000 Traditional means for identity validation (PIN codes, passwords), and physiological and behavioral biometric characteristics (fingerprint, iris, and speech) are susceptible to hacker attacks and/or falsification. This paper presents a method for person verification/identification based on correlation of present-to-previous limb ECG leads: I (), II (), calculated from them first principal ECG component (), linear and nonlinear combinations between , , and . For the verification task, the one-to-one scenario is applied and threshold values for , , and and their combinations are derived. The identification task supposes one-to-many scenario and the tested subject is identified according to the maximal correlation with a previously recorded ECG in a database. The population based ECG-ILSA database of 540 patients (147 healthy subjects, 175 patients with cardiac diseases, and 218 with hypertension) has been considered. In addition a common reference PTB dataset (14 healthy individuals) with short time interval between the two acquisitions has been taken into account. The results on ECG-ILSA database were satisfactory with healthy people, and there was not a significant decrease in nonhealthy patients, demonstrating the robustness of the proposed method. With PTB database, the method provides an identification accuracy of 92.9% and a verification sensitivity and specificity of 100% and 89.9%. Irena Jekova and Giovanni Bortolan Copyright © 2015 Irena Jekova and Giovanni Bortolan. All rights reserved. Building Integrated Ontological Knowledge Structures with Efficient Approximation Algorithms Tue, 13 Oct 2015 13:54:53 +0000 The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms. Yang Xiang and Sarath Chandra Janga Copyright © 2015 Yang Xiang and Sarath Chandra Janga. All rights reserved. Predicting Drug-Target Interactions via Within-Score and Between-Score Mon, 12 Oct 2015 13:48:13 +0000 Network inference and local classification models have been shown to be useful in predicting newly potential drug-target interactions (DTIs) for assisting in drug discovery or drug repositioning. The idea is to represent drugs, targets, and their interactions as a bipartite network or an adjacent matrix. However, existing methods have not yet addressed appropriately several issues, such as the powerless inference in the case of isolated subnetworks, the biased classifiers derived from insufficient positive samples, the need of training a number of local classifiers, and the unavailable relationship between known DTIs and unapproved drug-target pairs (DTPs). Designing more effective approaches to address those issues is always desirable. In this paper, after presenting better drug similarities and target similarities, we characterize each DTP as a feature vector of within-scores and between-scores so as to hold the following superiorities: (1) a uniform vector of all types of DTPs, (2) only one global classifier with less bias benefiting from adequate positive samples, and (3) more importantly, the visualized relationship between known DTIs and unapproved DTPs. The effectiveness of our approach is finally demonstrated via comparing with other popular methods under cross validation and predicting potential interactions for DTPs under the validation in existing databases. Jian-Yu Shi, Zun Liu, Hui Yu, and Yong-Jun Li Copyright © 2015 Jian-Yu Shi et al. All rights reserved. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection Mon, 12 Oct 2015 11:18:29 +0000 The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information. Xin Ma, Jing Guo, and Xiao Sun Copyright © 2015 Xin Ma et al. All rights reserved. RNAseq by Total RNA Library Identifies Additional RNAs Compared to Poly(A) RNA Library Mon, 12 Oct 2015 09:19:06 +0000 The most popular RNA library used for RNA sequencing is the poly(A) captured RNA library. This library captures RNA based on the presence of poly(A) tails at the 3′ end. Another type of RNA library for RNA sequencing is the total RNA library which differs from the poly(A) library by capture method and price. The total RNA library costs more and its capture of RNA is not dependent on the presence of poly(A) tails. In practice, only ribosomal RNAs and small RNAs are washed out in the total RNA library preparation. To evaluate the ability of detecting RNA for both RNA libraries we designed a study using RNA sequencing data of the same two breast cancer cell lines from both RNA libraries. We found that the RNA expression values captured by both RNA libraries were highly correlated. However, the number of RNAs captured was significantly higher for the total RNA library. Furthermore, we identify several subsets of protein coding RNAs that were not captured efficiently by the poly(A) library. One of the most noticeable is the histone-encode genes, which lack the poly(A) tail. Yan Guo, Shilin Zhao, Quanhu Sheng, Mingsheng Guo, Brian Lehmann, Jennifer Pietenpol, David C. Samuels, and Yu Shyr Copyright © 2015 Yan Guo et al. All rights reserved. Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA Mon, 12 Oct 2015 08:57:17 +0000 A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. Huiyan Jiang, Di Zhao, Ruiping Zheng, and Xiaoqi Ma Copyright © 2015 Huiyan Jiang et al. All rights reserved. OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes Mon, 12 Oct 2015 08:53:14 +0000 Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons—codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics. Kashish Chetal and Sarath Chandra Janga Copyright © 2015 Kashish Chetal and Sarath Chandra Janga. All rights reserved. Coexpression Network Analysis of miRNA-142 Overexpression in Neuronal Cells Sun, 11 Oct 2015 14:01:19 +0000 MicroRNAs are small noncoding RNA molecules, which are differentially expressed in diverse biological processes and are also involved in the regulation of multiple genes. A number of sites in the 3′ untranslated regions (UTRs) of different mRNAs allow complimentary binding for a microRNA, leading to their posttranscriptional regulation. The miRNA-142 is one of the microRNAs overexpressed in neurons that is found to regulate SIRT1 and MAOA genes. Differential analysis of gene expression data, which is focused on identifying up- or downregulated genes, ignores many relationships between genes affected by miRNA-142 overexpression in a cell. Thus, we applied a correlation network model to identify the coexpressed genes and to study the impact of miRNA-142 overexpression on this network. Combining multiple sources of knowledge is useful to infer meaningful relationships in systems biology. We applied coexpression model on the data obtained from wild type and miR-142 overexpression neuronal cells and integrated miRNA seed sequence mapping information to identify genes greatly affected by this overexpression. Larger differences in the enriched networks revealed that the nervous system development related genes such as TEAD2, PLEKHA6, and POGLUT1 were greatly impacted due to miRNA-142 overexpression. Ishwor Thapa, Howard S. Fox, and Dhundy Bastola Copyright © 2015 Ishwor Thapa et al. All rights reserved. Assessing Computational Steps for CLIP-Seq Data Analysis Sun, 11 Oct 2015 13:53:05 +0000 RNA-binding protein (RBP) is a key player in regulating gene expression at the posttranscriptional level. CLIP-Seq, with the ability to provide a genome-wide map of protein-RNA interactions, has been increasingly used to decipher RBP-mediated posttranscriptional regulation. Generating highly reliable binding sites from CLIP-Seq requires not only stringent library preparation but also considerable computational efforts. Here we presented a first systematic evaluation of major computational steps for identifying RBP binding sites from CLIP-Seq data, including preprocessing, the choice of control samples, peak normalization, and motif discovery. We found that avoiding PCR amplification artifacts, normalizing to input RNA or mRNAseq, and defining the background model from control samples can reduce the bias introduced by RNA abundance and improve the quality of detected binding sites. Our findings can serve as a general guideline for CLIP experiments design and the comprehensive analysis of CLIP-Seq data. Qi Liu, Xue Zhong, Blair B. Madison, Anil K. Rustgi, and Yu Shyr Copyright © 2015 Qi Liu et al. All rights reserved. Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations Sun, 11 Oct 2015 13:45:41 +0000 An accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine learning. Here, we explored the patterns of 1,760,846 somatic mutations identified from 230,255 cancer patients along with gene function information using support vector machine. Specifically, we performed a multiclass classification experiment over the 17 tumor sites using the gene symbol, somatic mutation, chromosome, and gene functional pathway as predictors for 6,751 subjects. The performance of the baseline using only gene features is 0.57 in accuracy. It was improved to 0.62 when adding the information of mutation and chromosome. Among the predictable primary tumor sites, the prediction of five primary sites (large intestine, liver, skin, pancreas, and lung) could achieve the performance with more than 0.70 in F-measure. The model of the large intestine ranked the first with 0.87 in F-measure. The results demonstrate that the somatic mutation information is useful for prediction of primary tumor sites with machine learning modeling. To our knowledge, this study is the first investigation of the primary sites classification using machine learning and somatic mutation data. Yukun Chen, Jingchun Sun, Liang-Chin Huang, Hua Xu, and Zhongming Zhao Copyright © 2015 Yukun Chen et al. All rights reserved. A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference Sun, 11 Oct 2015 13:43:57 +0000 High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves. Adam Cornish and Chittibabu Guda Copyright © 2015 Adam Cornish and Chittibabu Guda. All rights reserved. How to Choose In Vitro Systems to Predict In Vivo Drug Clearance: A System Pharmacology Perspective Sun, 11 Oct 2015 13:35:32 +0000 The use of in vitro metabolism data to predict human clearance has become more significant in the current prediction of large scale drug clearance for all the drugs. The relevant information (in vitro metabolism data and in vivo human clearance values) of thirty-five drugs that satisfied the entry criteria of probe drugs was collated from the literature. Then the performance of different in vitro systems including Escherichia coli system, yeast system, lymphoblastoid system and baculovirus system is compared after in vitro-in vivo extrapolation. Baculovirus system, which can provide most of the data, has almost equal accuracy as the other systems in predicting clearance. And in most cases, baculovirus system has the smaller CV in scaling factors. Therefore, the baculovirus system can be recognized as the suitable system for the large scale drug clearance prediction. Lei Wang, ChienWei Chiang, Hong Liang, Hengyi Wu, Weixing Feng, Sara K. Quinney, Jin Li, and Lang Li Copyright © 2015 Lei Wang et al. All rights reserved. A Genetic Algorithm Based Support Vector Machine Model for Blood-Brain Barrier Penetration Prediction Sun, 04 Oct 2015 11:09:01 +0000 Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. Daqing Zhang, Jianfeng Xiao, Nannan Zhou, Mingyue Zheng, Xiaomin Luo, Hualiang Jiang, and Kaixian Chen Copyright © 2015 Daqing Zhang et al. All rights reserved. How to Use SNP_TATA_Comparator to Find a Significant Change in Gene Expression Caused by the Regulatory SNP of This Gene’s Promoter via a Change in Affinity of the TATA-Binding Protein for This Promoter Sun, 04 Oct 2015 07:28:06 +0000 The use of biomedical SNP markers of diseases can improve effectiveness of treatment. Genotyping of patients with subsequent searching for SNPs more frequent than in norm is the only commonly accepted method for identification of SNP markers within the framework of translational research. The bioinformatics applications aimed at millions of unannotated SNPs of the “1000 Genomes” can make this search for SNP markers more focused and less expensive. We used our Web service involving Fisher’s -score for candidate SNP markers to find a significant change in a gene’s expression. Here we analyzed the change caused by SNPs in the gene’s promoter via a change in affinity of the TATA-binding protein for this promoter. We provide examples and discuss how to use this bioinformatics application in the course of practical analysis of unannotated SNPs from the “1000 Genomes” project. Using known biomedical SNP markers, we identified 17 novel candidate SNP markers nearby: rs549858786 (rheumatoid arthritis); rs72661131 (cardiovascular events in rheumatoid arthritis); rs562962093 (stroke); rs563558831 (cyclophosphamide bioactivation); rs55878706 (malaria resistance, leukopenia), rs572527200 (asthma, systemic sclerosis, and psoriasis), rs371045754 (hemophilia B), rs587745372 (cardiovascular events); rs372329931, rs200209906, rs367732974, and rs549591993 (all four: cancer); rs17231520 and rs569033466 (both: atherosclerosis); rs63750953, rs281864525, and rs34166473 (all three: malaria resistance, thalassemia). Mikhail Ponomarenko, Dmitry Rasskazov, Olga Arkova, Petr Ponomarenko, Valentin Suslov, Ludmila Savinkova, and Nikolay Kolchanov Copyright © 2015 Mikhail Ponomarenko et al. All rights reserved. Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application to the Early Zebrafish Embryo Thu, 01 Oct 2015 13:15:34 +0000 Recent progress in microscopy technologies, biological markers, and automated processing methods is making possible the development of gene expression atlases at cellular-level resolution over whole embryos. Raw data on gene expression is usually very noisy. This noise comes from both experimental (technical/methodological) and true biological sources (from stochastic biochemical processes). In addition, the cells or nuclei being imaged are irregularly arranged in 3D space. This makes the processing, extraction, and study of expression signals and intrinsic biological noise a serious challenge for 3D data, requiring new computational approaches. Here, we present a new approach for studying gene expression in nuclei located in a thick layer around a spherical surface. The method includes depth equalization on the sphere, flattening, interpolation to a regular grid, pattern extraction by Shaped 3D singular spectrum analysis (SSA), and interpolation back to original nuclear positions. The approach is demonstrated on several examples of gene expression in the zebrafish egg (a model system in vertebrate development). The method is tested on several different data geometries (e.g., nuclear positions) and different forms of gene expression patterns. Fully 3D datasets for developmental gene expression are becoming increasingly available; we discuss the prospects of applying 3D-SSA to data processing and analysis in this growing field. Alex Shlemov, Nina Golyandina, David Holloway, and Alexander Spirov Copyright © 2015 Alex Shlemov et al. All rights reserved. Analysis of Chemical Properties of Edible and Medicinal Ginger by Metabolomics Approach Thu, 01 Oct 2015 13:06:22 +0000 In traditional herbal medicine, comprehensive understanding of bioactive constituent is important in order to analyze its true medicinal function. We investigated the chemical properties of medicinal and edible ginger cultivars using a liquid-chromatography mass spectrometry (LC-MS) approach. Our PCA results indicate the importance of acetylated derivatives of gingerol, not gingerol or shogaol, as the medicinal indicator. A newly developed ginger cultivar, Z. officinale cv. Ogawa Umare or “Ogawa Umare” (OG), contains more active ingredients, showing properties as a new resource for the production of herbal medicines derived from ginger in terms of its chemical constituents and rhizome yield. Ken Tanaka, Masanori Arita, Hiroaki Sakurai, Naoaki Ono, and Yasuhiro Tezuka Copyright © 2015 Ken Tanaka et al. All rights reserved. EMRlog Method for Computer Security for Electronic Medical Records with Logic and Data Mining Thu, 01 Oct 2015 13:04:50 +0000 The proper functioning of a hospital computer system is an arduous work for managers and staff. However, inconsistent policies are frequent and can produce enormous problems, such as stolen information, frequent failures, and loss of the entire or part of the hospital data. This paper presents a new method named EMRlog for computer security systems in hospitals. EMRlog is focused on two kinds of security policies: directive and implemented policies. Security policies are applied to computer systems that handle huge amounts of information such as databases, applications, and medical records. Firstly, a syntactic verification step is applied by using predicate logic. Then data mining techniques are used to detect which security policies have really been implemented by the computer systems staff. Subsequently, consistency is verified in both kinds of policies; in addition these subsets are contrasted and validated. This is performed by an automatic theorem prover. Thus, many kinds of vulnerabilities can be removed for achieving a safer computer system. Sergio Mauricio Martínez Monterrubio, Juan Frausto Solis, and Raúl Monroy Borja Copyright © 2015 Sergio Mauricio Martínez Monterrubio et al. All rights reserved. Cellular Metabolic Network Analysis: Discovering Important Reactions in Treponema pallidum Thu, 01 Oct 2015 11:46:40 +0000 T. pallidum, the syphilis-causing pathogen, performs very differently in metabolism compared with other bacterial pathogens. The desire for safe and effective vaccine of syphilis requests identification of important steps in T. pallidum’s metabolism. Here, we apply Flux Balance Analysis to represent the reactions quantitatively. Thus, it is possible to cluster all reactions in T. pallidum. By calculating minimal cut sets and analyzing topological structure for the metabolic network of T. pallidum, critical reactions are identified. As a comparison, we also apply the analytical approaches to the metabolic network of H. pylori to find coregulated drug targets and unique drug targets for different microorganisms. Based on the clustering results, all reactions are further classified into various roles. Therefore, the general picture of their metabolic network is obtained and two types of reactions, both of which are involved in nucleic acid metabolism, are found to be essential for T. pallidum. It is also discovered that both hubs of reactions and the isolated reactions in purine and pyrimidine metabolisms play important roles in T. pallidum. These reactions could be potential drug targets for treating syphilis. Xueying Chen, Min Zhao, and Hong Qu Copyright © 2015 Xueying Chen et al. All rights reserved. Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data Thu, 01 Oct 2015 07:26:59 +0000 With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method’s suitability for efficient knowledge discovery from big sequence data. Akihito Kikuchi, Toshimichi Ikemura, and Takashi Abe Copyright © 2015 Akihito Kikuchi et al. All rights reserved. Discovering Distinct Functional Modules of Specific Cancer Types Using Protein-Protein Interaction Networks Thu, 01 Oct 2015 07:05:17 +0000 Background. The molecular profiles exhibited in different cancer types are very different; hence, discovering distinct functional modules associated with specific cancer types is very important to understand the distinct functions associated with them. Protein-protein interaction networks carry vital information about molecular interactions in cellular systems, and identification of functional modules (subgraphs) in these networks is one of the most important applications of biological network analysis. Results. In this study, we developed a new graph theory based method to identify distinct functional modules from nine different cancer protein-protein interaction networks. The method is composed of three major steps: (i) extracting modules from protein-protein interaction networks using network clustering algorithms; (ii) identifying distinct subgraphs from the derived modules; and (iii) identifying distinct subgraph patterns from distinct subgraphs. The subgraph patterns were evaluated using experimentally determined cancer-specific protein-protein interaction data from the Ingenuity knowledgebase, to identify distinct functional modules that are specific to each cancer type. Conclusion. We identified cancer-type specific subgraph patterns that may represent the functional modules involved in the molecular pathogenesis of different cancer types. Our method can serve as an effective tool to discover cancer-type specific functional modules from large protein-protein interaction networks. Ru Shen, Xiaosheng Wang, and Chittibabu Guda Copyright © 2015 Ru Shen et al. All rights reserved. Development and Mining of a Volatile Organic Compound Database Thu, 01 Oct 2015 06:59:32 +0000 Volatile organic compounds (VOCs) are small molecules that exhibit high vapor pressure under ambient conditions and have low boiling points. Although VOCs contribute only a small proportion of the total metabolites produced by living organisms, they play an important role in chemical ecology specifically in the biological interactions between organisms and ecosystems. VOCs are also important in the health care field as they are presently used as a biomarker to detect various human diseases. Information on VOCs is scattered in the literature until now; however, there is still no available database describing VOCs and their biological activities. To attain this purpose, we have developed KNApSAcK Metabolite Ecology Database, which contains the information on the relationships between VOCs and their emitting organisms. The KNApSAcK Metabolite Ecology is also linked with the KNApSAcK Core and KNApSAcK Metabolite Activity Database to provide further information on the metabolites and their biological activities. The VOC database can be accessed online. Azian Azamimi Abdullah, Md. Altaf-Ul-Amin, Naoaki Ono, Tetsuo Sato, Tadao Sugiura, Aki Hirai Morita, Tetsuo Katsuragi, Ai Muto, Takaaki Nishioka, and Shigehiko Kanaya Copyright © 2015 Azian Azamimi Abdullah et al. All rights reserved. Systematic Analysis of the Associations between Adverse Drug Reactions and Pathways Thu, 01 Oct 2015 06:52:17 +0000 Adverse drug reactions (ADRs) are responsible for drug candidate failure during clinical trials. It is crucial to investigate biological pathways contributing to ADRs. Here, we applied a large-scale analysis to identify overrepresented ADR-pathway combinations through merging clinical phenotypic data, biological pathway data, and drug-target relations. Evaluation was performed by scientific literature review and defining a pathway-based ADR-ADR similarity measure. The results showed that our method is efficient for finding the associations between ADRs and pathways. To more systematically understand the mechanisms of ADRs, we constructed an ADR-pathway network and an ADR-ADR network. Through network analysis on biology and pharmacology, it was found that frequent ADRs were associated with more pathways than infrequent and rare ADRs. Moreover, environmental information processing pathways contributed most to the observed ADRs. Integrating the system organ class of ADRs, we found that most classes tended to interact with other classes instead of themselves. ADR classes were distributed promiscuously in all the ADR cliques. These results reflected that drug perturbation to a certain pathway can cause changes in multiple organs, rather than in one specific organ. Our work not only provides a global view of the associations between ADRs and pathways, but also is helpful to understand the mechanisms of ADRs. Xiaowen Chen, Yanqiu Wang, Pingping Wang, Baofeng Lian, Chunquan Li, Jing Wang, Xia Li, and Wei Jiang Copyright © 2015 Xiaowen Chen et al. All rights reserved. METSP: A Maximum-Entropy Classifier Based Text Mining Tool for Transporter-Substrate Identification with Semistructured Text Thu, 01 Oct 2015 06:50:59 +0000 The substrates of a transporter are not only useful for inferring function of the transporter, but also important to discover compound-compound interaction and to reconstruct metabolic pathway. Though plenty of data has been accumulated with the developing of new technologies such as in vitro transporter assays, the search for substrates of transporters is far from complete. In this article, we introduce METSP, a maximum-entropy classifier devoted to retrieve transporter-substrate pairs (TSPs) from semistructured text. Based on the high quality annotation from UniProt, METSP achieves high precision and recall in cross-validation experiments. When METSP is applied to 182,829 human transporter annotation sentences in UniProt, it identifies 3942 sentences with transporter and compound information. Finally, 1547 confidential human TSPs are identified for further manual curation, among which 58.37% pairs with novel substrates not annotated in public transporter databases. METSP is the first efficient tool to extract TSPs from semistructured annotation text in UniProt. This tool can help to determine the precise substrates and drugs of transporters, thus facilitating drug-target prediction, metabolic network reconstruction, and literature classification. Min Zhao, Yanming Chen, Dacheng Qu, and Hong Qu Copyright © 2015 Min Zhao et al. All rights reserved. A Glimpse to Background and Characteristics of Major Molecular Biological Networks Wed, 30 Sep 2015 13:30:47 +0000 Recently, biology has become a data intensive science because of huge data sets produced by high throughput molecular biological experiments in diverse areas including the fields of genomics, transcriptomics, proteomics, and metabolomics. These huge datasets have paved the way for system-level analysis of the processes and subprocesses of the cell. For system-level understanding, initially the elements of a system are connected based on their mutual relations and a network is formed. Among omics researchers, construction and analysis of biological networks have become highly popular. In this review, we briefly discuss both the biological background and topological properties of major types of omics networks to facilitate a comprehensive understanding and to conceptualize the foundation of network biology. Md. Altaf-Ul-Amin, Tetsuo Katsuragi, Tetsuo Sato, and Shigehiko Kanaya Copyright © 2015 Md. Altaf-Ul-Amin et al. All rights reserved. Bioinformatics Methods and Biological Interpretation for Next-Generation Sequencing Data Mon, 07 Sep 2015 06:56:22 +0000 Guohua Wang, Yunlong Liu, Dongxiao Zhu, Gunnar W. Klau, and Weixing Feng Copyright © 2015 Guohua Wang et al. All rights reserved. MicroRNA Promoter Identification in Arabidopsis Using Multiple Histone Markers Thu, 03 Sep 2015 13:14:36 +0000 A microRNA is a small noncoding RNA molecule, which functions in RNA silencing and posttranscriptional regulation of gene expression. To understand the mechanism of the activation of microRNA genes, the location of promoter regions driving their expression is required to be annotated precisely. Only a fraction of microRNA genes have confirmed transcription start sites (TSSs), which hinders our understanding of the transcription factor binding events. With the development of the next generation sequencing technology, the chromatin states can be inferred precisely by virtue of a combination of specific histone modifications. Using the genome-wide profiles of nine histone markers including H3K4me2, H3K4me3, H3K9Ac, H3K9me2, H3K18Ac, H3K27me1, H3K27me3, H3K36me2, and H3K36me3, we developed a computational strategy to identify the promoter regions of most microRNA genes in Arabidopsis, based upon the assumption that the distribution of histone markers around the TSSs of microRNA genes is similar to the TSSs of protein coding genes. Among 298 miRNA genes, our model identified 42 independent miRNA TSSs and 132 miRNA TSSs, which are located in the promoters of upstream genes. The identification of promoters will provide better understanding of microRNA regulation and can play an important role in the study of diseases at genetic level. Yuming Zhao, Fang Wang, and Liran Juan Copyright © 2015 Yuming Zhao et al. All rights reserved. Constructing a Genome-Wide LD Map of Wild A. gambiae Using Next-Generation Sequencing Thu, 03 Sep 2015 13:11:49 +0000 Anopheles gambiae is the major malaria vector in Africa. Examining the molecular basis of A. gambiae traits requires knowledge of both genetic variation and genome-wide linkage disequilibrium (LD) map of wild A. gambiae populations from malaria-endemic areas. We sequenced the genomes of nine wild A. gambiae mosquitoes individually using next-generation sequencing technologies and detected 2,219,815 common single nucleotide polymorphisms (SNPs), 88% of which are novel. SNPs are not evenly distributed across A. gambiae chromosomes. The low SNP-frequency regions overlay heterochromatin and chromosome inversion domains, consistent with the lower recombinant rates at these regions. Nearly one million SNPs that were genotyped correctly in all individual mosquitoes with 99.6% confidence were extracted from these high-throughput sequencing data. Based on these SNP genotypes, we constructed a genome-wide LD map for wild A. gambiae from malaria-endemic areas in Kenya and made it available through a public Website. The average size of LD blocks is less than 40 bp, and several large LD blocks were also discovered clustered around the para gene, which is consistent with the effect of insecticide selective sweeps. The SNPs and the LD map will be valuable resources for scientific communities to dissect the A. gambiae genome. Xiaohong Wang, Yaw A. Afrane, Guiyun Yan, and Jun Li Copyright © 2015 Xiaohong Wang et al. All rights reserved. Survey of Programs Used to Detect Alternative Splicing Isoforms from Deep Sequencing Data In Silico Thu, 03 Sep 2015 11:55:27 +0000 Next-generation sequencing techniques have been rapidly emerging. However, the massive sequencing reads hide a great deal of unknown important information. Advances have enabled researchers to discover alternative splicing (AS) sites and isoforms using computational approaches instead of molecular experiments. Given the importance of AS for gene expression and protein diversity in eukaryotes, detecting alternative splicing and isoforms represents a hot topic in systems biology and epigenetics research. The computational methods applied to AS prediction have improved since the emergence of next-generation sequencing. In this study, we introduce state-of-the-art research on AS and then compare the research methods and software tools available for AS based on next-generation sequencing reads. Finally, we discuss the prospects of computational methods related to AS. Feng Min, Sumei Wang, and Li Zhang Copyright © 2015 Feng Min et al. All rights reserved. Understanding Transcription Factor Regulation by Integrating Gene Expression and DNase I Hypersensitive Sites Thu, 03 Sep 2015 09:24:16 +0000 Transcription factors are proteins that bind to DNA sequences to regulate gene transcription. The transcription factor binding sites are short DNA sequences (5–20 bp long) specifically bound by one or more transcription factors. The identification of transcription factor binding sites and prediction of their function continue to be challenging problems in computational biology. In this study, by integrating the DNase I hypersensitive sites with known position weight matrices in the TRANSFAC database, the transcription factor binding sites in gene regulatory region are identified. Based on the global gene expression patterns in cervical cancer HeLaS3 cell and HelaS3-ifnα4h cell (interferon treatment on HeLaS3 cell for 4 hours), we present a model-based computational approach to predict a set of transcription factors that potentially cause such differential gene expression. Significantly, 6 out 10 predicted functional factors, including IRF, IRF-2, IRF-9, IRF-1 and IRF-3, ICSBP, belong to interferon regulatory factor family and upregulate the gene expression levels responding to the interferon treatment. Another factor, ISGF-3, is also a transcriptional activator induced by interferon alpha. Using the different transcription factor binding sites selected criteria, the prediction result of our model is consistent. Our model demonstrated the potential to computationally identify the functional transcription factors in gene regulation. Guohua Wang, Fang Wang, Qian Huang, Yu Li, Yunlong Liu, and Yadong Wang Copyright © 2015 Guohua Wang et al. All rights reserved. Active Microbial Communities Inhabit Sulphate-Methane Interphase in Deep Bedrock Fracture Fluids in Olkiluoto, Finland Thu, 03 Sep 2015 09:23:57 +0000 Active microbial communities of deep crystalline bedrock fracture water were investigated from seven different boreholes in Olkiluoto (Western Finland) using bacterial and archaeal 16S rRNA, dsrB, and mcrA gene transcript targeted 454 pyrosequencing. Over a depth range of 296–798 m below ground surface the microbial communities changed according to depth, salinity gradient, and sulphate and methane concentrations. The highest bacterial diversity was observed in the sulphate-methane mixing zone (SMMZ) at 250–350 m depth, whereas archaeal diversity was highest in the lowest boundaries of the SMMZ. Sulphide-oxidizing ε-proteobacteria (Sulfurimonas sp.) dominated in the SMMZ and γ-proteobacteria (Pseudomonas spp.) below the SMMZ. The active archaeal communities consisted mostly of ANME-2D and Thermoplasmatales groups, although Methermicoccaceae, Methanobacteriaceae, and Thermoplasmatales (SAGMEG, TMG) were more common at 415–559 m depth. Typical indicator microorganisms for sulphate-methane transition zones in marine sediments, such as ANME-1 archaea, α-, β- and δ-proteobacteria, JS1, Actinomycetes, Planctomycetes, Chloroflexi, and MBGB Crenarchaeota were detected at specific depths. DsrB genes were most numerous and most actively transcribed in the SMMZ while the mcrA gene concentration was highest in the deep methane rich groundwater. Our results demonstrate that active and highly diverse but sparse and stratified microbial communities inhabit the Fennoscandian deep bedrock ecosystems. Malin Bomberg, Mari Nyyssönen, Petteri Pitkänen, Anne Lehtinen, and Merja Itävaara Copyright © 2015 Malin Bomberg et al. All rights reserved. 454-Pyrosequencing Analysis of Bacterial Communities from Autotrophic Nitrogen Removal Bioreactors Utilizing Universal Primers: Effect of Annealing Temperature Thu, 03 Sep 2015 09:22:02 +0000 Identification of anaerobic ammonium oxidizing (anammox) bacteria by molecular tools aimed at the evaluation of bacterial diversity in autotrophic nitrogen removal systems is limited by the difficulty to design universal primers for the Bacteria domain able to amplify the anammox 16S rRNA genes. A metagenomic analysis (pyrosequencing) of total bacterial diversity including anammox population in five autotrophic nitrogen removal technologies, two bench-scale models (MBR and Low Temperature CANON) and three full-scale bioreactors (anammox, CANON, and DEMON), was successfully carried out by optimization of primer selection and PCR conditions (annealing temperature). The universal primer 530F was identified as the best candidate for total bacteria and anammox bacteria diversity coverage. Salt-adjusted optimum annealing temperature of primer 530F was calculated (47°C) and hence a range of annealing temperatures of 44–49°C was tested. Pyrosequencing data showed that annealing temperature of 45°C yielded the best results in terms of species richness and diversity for all bioreactors analyzed. Alejandro Gonzalez-Martinez, Alejandro Rodriguez-Sanchez, Belén Rodelas, Ben A. Abbas, Maria Victoria Martinez-Toledo, Mark C. M. van Loosdrecht, F. Osorio, and Jesus Gonzalez-Lopez Copyright © 2015 Alejandro Gonzalez-Martinez et al. All rights reserved. mmnet: An R Package for Metagenomics Systems Biology Analysis Thu, 03 Sep 2015 09:22:02 +0000 The human microbiome plays important roles in human health and disease. Previous microbiome studies focused mainly on single pure species function and overlooked the interactions in the complex communities on system-level. A metagenomic approach introduced recently integrates metagenomic data with community-level metabolic network modeling, but no comprehensive tool was available for such kind of approaches. To facilitate these kinds of studies, we developed an R package, mmnet, to implement community-level metabolic network reconstruction. The package also implements a set of functions for automatic analysis pipeline construction including functional annotation of metagenomic reads, abundance estimation of enzymatic genes, community-level metabolic network reconstruction, and integrated network analysis. The result can be represented in an intuitive way and sent to Cytoscape for further exploration. The package has substantial potentials in metagenomic studies that focus on identifying system-level variations of human microbiome associated with disease. Yang Cao, Xiaofei Zheng, Fei Li, and Xiaochen Bo Copyright © 2015 Yang Cao et al. All rights reserved. Genetic Interactions Explain Variance in Cingulate Amyloid Burden: An AV-45 PET Genome-Wide Association and Interaction Study in the ADNI Cohort Thu, 03 Sep 2015 09:20:58 +0000 Alzheimer’s disease (AD) is the most common neurodegenerative disorder. Using discrete disease status as the phenotype and computing statistics at the single marker level may not be able to address the underlying biological interactions that contribute to disease mechanism and may contribute to the issue of “missing heritability.” We performed a genome-wide association study (GWAS) and a genome-wide interaction study (GWIS) of an amyloid imaging phenotype, using the data from Alzheimer’s Disease Neuroimaging Initiative. We investigated the genetic main effects and interaction effects on cingulate amyloid-beta (A) load in an effort to better understand the genetic etiology of A deposition that is a widely studied AD biomarker. PLINK was used in the single marker GWAS, and INTERSNP was used to perform the two-marker GWIS, focusing only on SNPs with for the GWAS analysis. Age, sex, and diagnosis were used as covariates in both analyses. Corrected p values using the Bonferroni method were reported. The GWAS analysis revealed significant hits within or proximal to APOE, APOC1, and TOMM40 genes, which were previously implicated in AD. The GWIS analysis yielded 8 novel SNP-SNP interaction findings that warrant replication and further investigation. Jin Li, Qiushi Zhang, Feng Chen, Jingwen Yan, Sungeun Kim, Lei Wang, Weixing Feng, Andrew J. Saykin, Hong Liang, and Li Shen Copyright © 2015 Jin Li et al. All rights reserved. How to Isolate a Plant’s Hypomethylome in One Shot Thu, 03 Sep 2015 09:14:51 +0000 Genome assembly remains a challenge for large and/or complex plant genomes due to their abundant repetitive regions resulting in studies focusing on gene space instead of the whole genome. Thus, DNA enrichment strategies facilitate the assembly by increasing the coverage and simultaneously reducing the complexity of the whole genome. In this paper we provide an easy, fast, and cost-effective variant of MRE-seq to obtain a plant’s hypomethylome by an optimized methyl filtration protocol followed by next generation sequencing. The method is demonstrated on three plant species with knowingly large and/or complex (polyploid) genomes: Oryza sativa, Picea abies, and Crocus sativus. The identified hypomethylomes show clear enrichment for genes and their flanking regions and clear reduction of transposable elements. Additionally, genomic sequences around genes are captured including regulatory elements in introns and up- and downstream flanks. High similarity of the results obtained by a de novo assembly approach with a reference based mapping in rice supports the applicability for studying and understanding the genomes of nonmodel organisms. Hence we show the high potential of MRE-seq in a wide range of scenarios for the direct analysis of methylation differences, for example, between ecotypes, individuals, within or across species harbouring large, and complex genomes. Elisabeth Wischnitzki, Eva Maria Sehr, Karin Hansel-Hohl, Maria Berenyi, Kornel Burg, and Silvia Fluch Copyright © 2015 Elisabeth Wischnitzki et al. All rights reserved. Data Acquisition and Processing in Biology and Medicine Wed, 26 Aug 2015 10:13:46 +0000 Cheng-Hong Yang, Yu-Jie Huang, An Liu, Yi Rong, and Tsair-Fwu Lee Copyright © 2015 Cheng-Hong Yang et al. All rights reserved. The Combinational Polymorphisms of ORAI1 Gene Are Associated with Preventive Models of Breast Cancer in the Taiwanese Tue, 25 Aug 2015 14:02:28 +0000 The ORAI calcium release-activated calcium modulator 1 (ORAI1) has been proven to be an important gene for breast cancer progression and metastasis. However, the protective association model between the single nucleotide polymorphisms (SNPs) of ORAI1 gene was not investigated. Based on a published data set of 345 female breast cancer patients and 290 female controls, we used a particle swarm optimization (PSO) algorithm to identify the possible protective models of breast cancer association in terms of the SNPs of ORAI1 gene. Results showed that the PSO-generated models of 2-SNP (rs12320939-TT/rs12313273-CC), 3-SNP (rs12320939-TT/rs12313273-CC/rs712853-(TT/TC)), 4-SNP (rs12320939-TT/rs12313273-CC/rs7135617-(GG/GT)/rs712853-(TT/TC)), and 5-SNP (rs12320939-TT/rs12313273-CC/rs7135617-(GG/GT)/rs6486795-CC/rs712853-(TT/TC)) displayed low values of odds ratios (0.409–0.425) for breast cancer association. Taken together, these results suggested that our proposed PSO strategy is powerful to identify the combinational SNPs of rs12320939, rs12313273, rs7135617, rs6486795, and rs712853 of ORAI1 gene with a strongly protective association in breast cancer. Fu Ou-Yang, Yu-Da Lin, Li-Yeh Chuang, Hsueh-Wei Chang, Cheng-Hong Yang, and Ming-Feng Hou Copyright © 2015 Fu Ou-Yang et al. All rights reserved. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information Tue, 25 Aug 2015 08:22:17 +0000 Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition. Chi Zhang, Li Tong, Ying Zeng, Jingfang Jiang, Haibing Bu, Bin Yan, and Jianxin Li Copyright © 2015 Chi Zhang et al. All rights reserved. Tennis Elbow Diagnosis Using Equivalent Uniform Voltage to Fit the Logistic and the Probit Diseased Probability Models Tue, 25 Aug 2015 07:46:17 +0000 To develop the logistic and the probit models to analyse electromyographic (EMG) equivalent uniform voltage- (EUV-) response for the tenderness of tennis elbow. In total, 78 hands from 39 subjects were enrolled. In this study, surface EMG (sEMG) signal is obtained by an innovative device with electrodes over forearm region. The analytical endpoint was defined as Visual Analog Score (VAS) 3+ tenderness of tennis elbow. The logistic and the probit diseased probability (DP) models were established for the VAS score and EMG absolute voltage-time histograms (AVTH). TV50 is the threshold equivalent uniform voltage predicting a 50% risk of disease. Twenty-one out of 78 samples (27%) developed VAS 3+ tenderness of tennis elbow reported by the subject and confirmed by the physician. The fitted DP parameters were TV50 = 153.0 mV (CI: 136.3–169.7 mV), γ50 = 0.84 (CI: 0.78–0.90) and TV50 = 155.6 mV (CI: 138.9–172.4 mV), m = 0.54 (CI: 0.49–0.59) for logistic and probit models, respectively. When the EUV ≥ 153 mV, the DP of the patient is greater than 50% and vice versa. The logistic and the probit models are valuable tools to predict the DP of VAS 3+ tenderness of tennis elbow. Tsair-Fwu Lee, Wei-Chun Lin, Hung-Yu Wang, Shu-Yuan Lin, Li-Fu Wu, Shih-Sian Guo, Hsiang-Jui Huang, Hui-Min Ting, and Pei-Ju Chao Copyright © 2015 Tsair-Fwu Lee et al. All rights reserved. A Data Hiding Technique to Synchronously Embed Physiological Signals in H.264/AVC Encoded Video for Medicine Healthcare Tue, 25 Aug 2015 07:45:41 +0000 The recognition of clinical manifestations in both video images and physiological-signal waveforms is an important aid to improve the safety and effectiveness in medical care. Physicians can rely on video-waveform (VW) observations to recognize difficult-to-spot signs and symptoms. The VW observations can also reduce the number of false positive incidents and expand the recognition coverage to abnormal health conditions. The synchronization between the video images and the physiological-signal waveforms is fundamental for the successful recognition of the clinical manifestations. The use of conventional equipment to synchronously acquire and display the video-waveform information involves complex tasks such as the video capture/compression, the acquisition/compression of each physiological signal, and the video-waveform synchronization based on timestamps. This paper introduces a data hiding technique capable of both enabling embedding channels and synchronously hiding samples of physiological signals into encoded video sequences. Our data hiding technique offers large data capacity and simplifies the complexity of the video-waveform acquisition and reproduction. The experimental results revealed successful embedding and full restoration of signal’s samples. Our results also demonstrated a small distortion in the video objective quality, a small increment in bit-rate, and embedded cost savings of −2.6196% for high and medium motion video sequences. Raul Peña, Alfonso Ávila, David Muñoz, and Juan Lavariega Copyright © 2015 Raul Peña et al. All rights reserved. Information-Theoretical Quantifier of Brain Rhythm Based on Data-Driven Multiscale Representation Mon, 24 Aug 2015 14:18:37 +0000 This paper presents a data-driven multiscale entropy measure to reveal the scale dependent information quantity of electroencephalogram (EEG) recordings. This work is motivated by the previous observations on the nonlinear and nonstationary nature of EEG over multiple time scales. Here, a new framework of entropy measures considering changing dynamics over multiple oscillatory scales is presented. First, to deal with nonstationarity over multiple scales, EEG recording is decomposed by applying the empirical mode decomposition (EMD) which is known to be effective for extracting the constituent narrowband components without a predetermined basis. Following calculation of Renyi entropy of the probability distributions of the intrinsic mode functions extracted by EMD leads to a data-driven multiscale Renyi entropy. To validate the performance of the proposed entropy measure, actual EEG recordings from rats experiencing 7 min cardiac arrest followed by resuscitation were analyzed. Simulation and experimental results demonstrate that the use of the multiscale Renyi entropy leads to better discriminative capability of the injury levels and improved correlations with the neurological deficit evaluation after 72 hours after cardiac arrest, thus suggesting an effective diagnostic and prognostic tool. Young-Seok Choi Copyright © 2015 Young-Seok Choi. All rights reserved. Gene Network Analysis of Glucose Linked Signaling Pathways and Their Role in Human Hepatocellular Carcinoma Cell Growth and Survival in HuH7 and HepG2 Cell Lines Mon, 24 Aug 2015 11:19:10 +0000 Cancer progression may be affected by metabolism. In this study, we aimed to analyze the effect of glucose on the proliferation and/or survival of human hepatocellular carcinoma (HCC) cells. Human gene datasets regulated by glucose were compared to gene datasets either dysregulated in HCC or regulated by other signaling pathways. Significant numbers of common genes suggested putative involvement in transcriptional regulations by glucose. Real-time proliferation assays using high (4.5 g/L) versus low (1 g/L) glucose on two human HCC cell lines and specific inhibitors of selected pathways were used for experimental validations. High glucose promoted HuH7 cell proliferation but not that of HepG2 cell line. Gene network analyses suggest that gene transcription by glucose could be mediated at 92% through ChREBP in HepG2 cells, compared to 40% in either other human cells or rodent healthy liver, with alteration of LKB1 (serine/threonine kinase 11) and NOX (NADPH oxidases) signaling pathways and loss of transcriptional regulation of PPARGC1A (peroxisome-proliferator activated receptors gamma coactivator 1) target genes by high glucose. Both PPARA and PPARGC1A regulate transcription of genes commonly regulated by glycolysis, by the antidiabetic agent metformin and by NOX, suggesting their major interplay in the control of HCC progression. Emmanuelle Berger, Nathalie Vega, Michèle Weiss-Gayet, and Alain Géloën Copyright © 2015 Emmanuelle Berger et al. All rights reserved. Applying NGS Data to Find Evolutionary Network Biomarkers from the Early and Late Stages of Hepatocellular Carcinoma Thu, 20 Aug 2015 07:07:01 +0000 Hepatocellular carcinoma (HCC) is a major liver tumor (~80%), besides hepatoblastomas, angiosarcomas, and cholangiocarcinomas. In this study, we used a systems biology approach to construct protein-protein interaction networks (PPINs) for early-stage and late-stage liver cancer. By comparing the networks of these two stages, we found that the two networks showed some common mechanisms and some significantly different mechanisms. To obtain differential network structures between cancer and noncancer PPINs, we constructed cancer PPIN and noncancer PPIN network structures for the two stages of liver cancer by systems biology method using NGS data from cancer cells and adjacent noncancer cells. Using carcinogenesis relevance values (CRVs), we identified 43 and 80 significant proteins and their PPINs (network markers) for early-stage and late-stage liver cancer. To investigate the evolution of network biomarkers in the carcinogenesis process, a primary pathway analysis showed that common pathways of the early and late stages were those related to ordinary cancer mechanisms. A pathway specific to the early stage was the mismatch repair pathway, while pathways specific to the late stage were the spliceosome pathway, lysine degradation pathway, and progesterone-mediated oocyte maturation pathway. This study provides a new direction for cancer-targeted therapies at different stages. Yung-Hao Wong, Chia-Chou Wu, Chih-Lung Lin, Ting-Shou Chen, Tzu-Hao Chang, and Bor-Sen Chen Copyright © 2015 Yung-Hao Wong et al. All rights reserved. The ABCC6 Transporter as a Paradigm for Networking from an Orphan Disease to Complex Disorders Tue, 18 Aug 2015 09:35:55 +0000 The knowledge on the genetic etiology of complex disorders largely results from the study of rare monogenic disorders. Often these common and rare diseases show phenotypic overlap, though monogenic diseases generally have a more extreme symptomatology. ABCC6, the gene responsible for pseudoxanthoma elasticum, an autosomal recessive ectopic mineralization disorder, can be considered a paradigm gene with relevance that reaches far beyond this enigmatic orphan disease. Indeed, common traits such as chronic kidney disease or cardiovascular disorders have been linked to the ABCC6 gene. While during the last decade the awareness of the wide ramifications of ABCC6 has increased significantly, the gene itself and the transmembrane transporter it encodes have not unveiled all of the mysteries that surround them. To gain more insights, multiple approaches are being used including next-generation sequencing, computational methods, and various “omics” technologies. Much effort is made to place the vast amount of data that is gathered in an integrated system-biological network; the involvement of ABCC6 in common disorders provides a good view on the wide implications and potential of such a network. In this review, we summarize the network approaches used to study ABCC6 and the role of this gene in several complex diseases. Eva Y. G. De Vilder, Mohammad Jakir Hosen, and Olivier M. Vanakker Copyright © 2015 Eva Y. G. De Vilder et al. All rights reserved. An Affinity Propagation-Based DNA Motif Discovery Algorithm Mon, 10 Aug 2015 09:57:56 +0000 The planted motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy. Chunxiao Sun, Hongwei Huo, Qiang Yu, Haitao Guo, and Zhigang Sun Copyright © 2015 Chunxiao Sun et al. All rights reserved. Statistical Analysis of High-Dimensional Genetic Data in Complex Traits Tue, 04 Aug 2015 14:52:57 +0000 Taesung Park, Kristel Van Steen, Xiang-Yang Lou, and Momiao Xiong Copyright © 2015 Taesung Park et al. All rights reserved. Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data Tue, 04 Aug 2015 11:27:26 +0000 Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called “large P and small N” problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration. Sungho Won, Hosik Choi, Suyeon Park, Juyoung Lee, Changyi Park, and Sunghoon Kwon Copyright © 2015 Sungho Won et al. All rights reserved. Detection of Epistatic and Gene-Environment Interactions Underlying Three Quality Traits in Rice Using High-Throughput Genome-Wide Data Tue, 04 Aug 2015 11:23:29 +0000 With development of sequencing technology, dense single nucleotide polymorphisms (SNPs) have been available, enabling uncovering genetic architecture of complex traits by genome-wide association study (GWAS). However, the current GWAS strategy usually ignores epistatic and gene-environment interactions due to absence of appropriate methodology and heavy computational burden. This study proposed a new GWAS strategy by combining the graphics processing unit- (GPU-) based generalized multifactor dimensionality reduction (GMDR) algorithm with mixed linear model approach. The reliability and efficiency of the analytical methods were verified through Monte Carlo simulations, suggesting that a population size of nearly 150 recombinant inbred lines (RILs) had a reasonable resolution for the scenarios considered. Further, a GWAS was conducted with the above two-step strategy to investigate the additive, epistatic, and gene-environment associations between 701,867 SNPs and three important quality traits, gelatinization temperature, amylose content, and gel consistency, in a RIL population with 138 individuals derived from super-hybrid rice Xieyou9308 in two environments. Four significant SNPs were identified with additive, epistatic, and gene-environment interaction effects. Our study showed that the mixed linear model approach combining with the GPU-based GMDR algorithm is a feasible strategy for implementing GWAS to uncover genetic architecture of crop complex traits. Haiming Xu, Beibei Jiang, Yujie Cao, Yingxin Zhang, Xiaodeng Zhan, Xihong Shen, Shihua Cheng, Xiangyang Lou, and Liyong Cao Copyright © 2015 Haiming Xu et al. All rights reserved. Systems Biology Approaches to Mining High Throughput Biological Data Tue, 04 Aug 2015 11:22:20 +0000 Fang-Xiang Wu, Min Li, Jishou Ruan, and Feng Luo Copyright © 2015 Fang-Xiang Wu et al. All rights reserved. Dynamic Model for RNA-seq Data Analysis Tue, 04 Aug 2015 11:20:34 +0000 By measuring messenger RNA levels for all genes in a sample, RNA-seq provides an attractive option to characterize the global changes in transcription. RNA-seq is becoming the widely used platform for gene expression profiling. However, real transcription signals in the RNA-seq data are confounded with measurement and sequencing errors and other random biological/technical variation. To extract biologically useful transcription process from the RNA-seq data, we propose to use the second ODE for modeling the RNA-seq data. We use differential principal analysis to develop statistical methods for estimation of location-varying coefficients of the ODE. We validate the accuracy of the ODE model to fit the RNA-seq data by prediction analysis and 5-fold cross validation. To further evaluate the performance of the ODE model for RNA-seq data analysis, we used the location-varying coefficients of the second ODE as features to classify the normal and tumor cells. We demonstrate that even using the ODE model for single gene we can achieve high classification accuracy. We also conduct response analysis to investigate how the transcription process responds to the perturbation of the external signals and identify dozens of genes that are related to cancer. Lerong Li and Momiao Xiong Copyright © 2015 Lerong Li and Momiao Xiong. All rights reserved. Robust Association Tests for the Replication of Genome-Wide Association Studies Tue, 04 Aug 2015 11:15:36 +0000 In genome-wide association study (GWAS), robust genetic association tests such as maximum of three CATTs (MAX3), each corresponding to recessive, additive, and dominant genetic models, the minimum p value of Pearson’s Chi-square test with 2 degrees of freedom, and CATT based on additive genetic model (MIN2), genetic model selection (GMS), and genetic model exclusion (GME) methods have been shown to provide better power performance under wide range of underlying genetic models. In this paper, we demonstrate how these robust tests can be applied to the replication study of GWAS and how the overall statistical significance can be evaluated using the combined test formed by p values of the discovery and replication studies. Jungnam Joo, Ju-Hyun Park, Bora Lee, Boram Park, Sohee Kim, Kyong-Ah Yoon, Jin Soo Lee, and Nancy L. Geller Copyright © 2015 Jungnam Joo et al. All rights reserved. Clique-Based Clustering of Correlated SNPs in a Gene Can Improve Performance of Gene-Based Multi-Bin Linear Combination Test Tue, 04 Aug 2015 10:59:47 +0000 Gene-based analysis of multiple single nucleotide polymorphisms (SNPs) in a gene region is an alternative to single SNP analysis. The multi-bin linear combination test (MLC) proposed in previous studies utilizes the correlation among SNPs within a gene to construct a gene-based global test. SNPs are partitioned into clusters of highly correlated SNPs, and the MLC test statistic quadratically combines linear combination statistics constructed for each cluster. The test has degrees of freedom equal to the number of clusters and can be more powerful than a fully quadratic or fully linear test statistic. In this study, we develop a new SNP clustering algorithm designed to find cliques, which are complete subnetworks of SNPs with all pairwise correlations above a threshold. We evaluate the performance of the MLC test using the clique-based CLQ algorithm versus using the tag-SNP-based LDSelect algorithm. In our numerical power calculations we observed that the two clustering algorithms produce identical clusters about 40~60% of the time, yielding similar power on average. However, because the CLQ algorithm tends to produce smaller clusters with stronger positive correlation, the MLC test is less likely to be affected by the occurrence of opposing signs in the individual SNP effect coefficients. Yun Joo Yoo, Sun Ah Kim, and Shelley B. Bull Copyright © 2015 Yun Joo Yoo et al. All rights reserved. Identifying and Assessing Interesting Subgroups in a Heterogeneous Population Mon, 03 Aug 2015 13:21:57 +0000 Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided. Woojoo Lee, Andrey Alexeyenko, Maria Pernemalm, Justine Guegan, Philippe Dessen, Vladimir Lazar, Janne Lehtiö, and Yudi Pawitan Copyright © 2015 Woojoo Lee et al. All rights reserved. Detecting Genetic Interactions for Quantitative Traits Using -Spacing Entropy Measure Mon, 03 Aug 2015 13:10:36 +0000 A number of statistical methods for detecting gene-gene interactions have been developed in genetic association studies with binary traits. However, many phenotype measures are intrinsically quantitative and categorizing continuous traits may not always be straightforward and meaningful. Association of gene-gene interactions with an observed distribution of such phenotypes needs to be investigated directly without categorization. Information gain based on entropy measure has previously been successful in identifying genetic associations with binary traits. We extend the usefulness of this information gain by proposing a nonparametric evaluation method of conditional entropy of a quantitative phenotype associated with a given genotype. Hence, the information gain can be obtained for any phenotype distribution. Because any functional form, such as Gaussian, is not assumed for the entire distribution of a trait or a given genotype, this method is expected to be robust enough to be applied to any phenotypic association data. Here, we show its use to successfully identify the main effect, as well as the genetic interactions, associated with a quantitative trait. Jaeyong Yee, Min-Seok Kwon, Seohoon Jin, Taesung Park, and Mira Park Copyright © 2015 Jaeyong Yee et al. All rights reserved. A Comparative Study on Multifactor Dimensionality Reduction Methods for Detecting Gene-Gene Interactions with the Survival Phenotype Mon, 03 Aug 2015 13:06:31 +0000 Genome-wide association studies (GWAS) have extensively analyzed single SNP effects on a wide variety of common and complex diseases and found many genetic variants associated with diseases. However, there is still a large portion of the genetic variants left unexplained. This missing heritability problem might be due to the analytical strategy that limits analyses to only single SNPs. One of possible approaches to the missing heritability problem is to consider identifying multi-SNP effects or gene-gene interactions. The multifactor dimensionality reduction method has been widely used to detect gene-gene interactions based on the constructive induction by classifying high-dimensional genotype combinations into one-dimensional variable with two attributes of high risk and low risk for the case-control study. Many modifications of MDR have been proposed and also extended to the survival phenotype. In this study, we propose several extensions of MDR for the survival phenotype and compare the proposed extensions with earlier MDR through comprehensive simulation studies. Seungyeoun Lee, Yongkang Kim, Min-Seok Kwon, and Taesung Park Copyright © 2015 Seungyeoun Lee et al. All rights reserved. On the Estimation of Heritability with Family-Based and Population-Based Samples Mon, 03 Aug 2015 13:00:35 +0000 For a family-based sample, the phenotypic variance-covariance matrix can be parameterized to include the variance of a polygenic effect that has then been estimated using a variance component analysis. However, with the advent of large-scale genomic data, the genetic relationship matrix (GRM) can be estimated and can be utilized to parameterize the variance of a polygenic effect for population-based samples. Therefore narrow sense heritability, which is both population and trait specific, can be estimated with both population- and family-based samples. In this study we estimate heritability from both family-based and population-based samples, collected in Korea, and the heritability estimates from the pooled samples were, for height, 0.60; body mass index (BMI), 0.32; log-transformed triglycerides (log TG), 0.24; total cholesterol (TCHL), 0.30; high-density lipoprotein (HDL), 0.38; low-density lipoprotein (LDL), 0.29; systolic blood pressure (SBP), 0.23; and diastolic blood pressure (DBP), 0.24. Furthermore, we found differences in how heritability is estimated—in particular the amount of variance attributable to common environment in twins can be substantial—which indicates heritability estimates should be interpreted with caution. Youngdoe Kim, Young Lee, Sungyoung Lee, Nam Hee Kim, Jeongmin Lim, Young Jin Kim, Ji Hee Oh, Haesook Min, Meehee Lee, Hyeon-Jeong Seo, So-Hyun Lee, Joohon Sung, Nam H. Cho, Bong-Jo Kim, Bok-Ghee Han, Robert C. Elston, Sungho Won, and Juyoung Lee Copyright © 2015 Youngdoe Kim et al. All rights reserved. Differential Expression Analysis in RNA-Seq by a Naive Bayes Classifier with Local Normalization Mon, 03 Aug 2015 11:48:07 +0000 To improve the applicability of RNA-seq technology, a large number of RNA-seq data analysis methods and correction algorithms have been developed. Although these new methods and algorithms have steadily improved transcriptome analysis, greater prediction accuracy is needed to better guide experimental designs with computational results. In this study, a new tool for the identification of differentially expressed genes with RNA-seq data, named GExposer, was developed. This tool introduces a local normalization algorithm to reduce the bias of nonrandomly positioned read depth. The naive Bayes classifier is employed to integrate fold change, transcript length, and GC content to identify differentially expressed genes. Results on several independent tests show that GExposer has better performance than other methods. The combination of the local normalization algorithm and naive Bayes classifier with three attributes can achieve better results; both false positive rates and false negative rates are reduced. However, only a small portion of genes is affected by the local normalization and GC content correction. Yongchao Dou, Xiaomei Guo, Lingling Yuan, David R. Holding, and Chi Zhang Copyright © 2015 Yongchao Dou et al. All rights reserved. -Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data Mon, 03 Aug 2015 11:22:13 +0000 With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear -profiles clustering method, which can be seen as the nonlinear counterpart of the -means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that -profiles clustering not only outperformed traditional linear -means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which -profile clustering generated biologically meaningful results. Kai Wang, Qing Zhao, Jianwei Lu, and Tianwei Yu Copyright © 2015 Kai Wang et al. All rights reserved. ProSim: A Method for Prioritizing Disease Genes Based on Protein Proximity and Disease Similarity Mon, 03 Aug 2015 10:49:13 +0000 Predicting disease genes for a particular genetic disease is very challenging in bioinformatics. Based on current research studies, this challenge can be tackled via network-based approaches. Furthermore, it has been highlighted that it is necessary to consider disease similarity along with the protein’s proximity to disease genes in a protein-protein interaction (PPI) network in order to improve the accuracy of disease gene prioritization. In this study we propose a new algorithm called proximity disease similarity algorithm (ProSim), which takes both of the aforementioned properties into consideration, to prioritize disease genes. To illustrate the proposed algorithm, we have conducted six case studies, namely, prostate cancer, Alzheimer’s disease, diabetes mellitus type 2, breast cancer, colorectal cancer, and lung cancer. We employed leave-one-out cross validation, mean enrichment, tenfold cross validation, and ROC curves to evaluate our proposed method and other existing methods. The results show that our proposed method outperforms existing methods such as PRINCE, RWR, and DADA. Gamage Upeksha Ganegoda, Yu Sheng, and Jianxin Wang Copyright © 2015 Gamage Upeksha Ganegoda et al. All rights reserved. Screening Ingredients from Herbs against Pregnane X Receptor in the Study of Inductive Herb-Drug Interactions: Combining Pharmacophore and Docking-Based Rank Aggregation Mon, 03 Aug 2015 09:57:17 +0000 The issue of herb-drug interactions has been widely reported. Herbal ingredients can activate nuclear receptors and further induce the gene expression alteration of drug-metabolizing enzyme and/or transporter. Therefore, the herb-drug interaction will happen when the herbs and drugs are coadministered. This kind of interaction is called inductive herb-drug interactions. Pregnane X Receptor (PXR) and drug-metabolizing target genes are involved in most of inductive herb-drug interactions. To predict this kind of herb-drug interaction, the protocol could be simplified to only screen agonists of PXR from herbs because the relations of drugs with their metabolizing enzymes are well studied. Here, a combinational in silico strategy of pharmacophore modelling and docking-based rank aggregation (DRA) was employed to identify PXR’s agonists. Firstly, 305 ingredients were screened out from 820 ingredients as candidate agonists of PXR with our pharmacophore model. Secondly, DRA was used to rerank the result of pharmacophore filtering. To validate our prediction, a curated herb-drug interaction database was built, which recorded 380 herb-drug interactions. Finally, among the top 10 herb ingredients from the ranking list, 6 ingredients were reported to involve in herb-drug interactions. The accuracy of our method is higher than other traditional methods. The strategy could be extended to studies on other inductive herb-drug interactions. Zhijie Cui, Hong Kang, Kailin Tang, Qi Liu, Zhiwei Cao, and Ruixin Zhu Copyright © 2015 Zhijie Cui et al. All rights reserved. Gene Signature of Human Oral Mucosa Fibroblasts: Comparison with Dermal Fibroblasts and Induced Pluripotent Stem Cells Mon, 03 Aug 2015 09:43:12 +0000 Oral mucosa is a useful material for regeneration therapy with the advantages of its accessibility and versatility regardless of age and gender. However, little is known about the molecular characteristics of oral mucosa. Here we report the first comparative profiles of the gene signatures of human oral mucosa fibroblasts (hOFs), human dermal fibroblasts (hDFs), and hOF-derived induced pluripotent stem cells (hOF-iPSCs), linking these with biological roles by functional annotation and pathway analyses. As a common feature of fibroblasts, both hOFs and hDFs expressed glycolipid metabolism-related genes at higher levels compared with hOF-iPSCs. Distinct characteristics of hOFs compared with hDFs included a high expression of glycoprotein genes, involved in signaling, extracellular matrix, membrane, and receptor proteins, besides a low expression of HOX genes, the hDFs-markers. The results of the pathway analyses indicated that tissue-reconstructive, proliferative, and signaling pathways are active, whereas senescence-related genes in p53 pathway are inactive in hOFs. Furthermore, more than half of hOF-specific genes were similarly expressed to those of hOF-iPSC genes and might be controlled by WNT signaling. Our findings demonstrated that hOFs have unique cellular characteristics in specificity and plasticity. These data may provide useful insight into application of oral fibroblasts for direct reprograming. Keiko Miyoshi, Taigo Horiguchi, Ayako Tanimura, Hiroko Hagita, and Takafumi Noma Copyright © 2015 Keiko Miyoshi et al. All rights reserved. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs Mon, 03 Aug 2015 09:33:16 +0000 Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers. Liang-Tsung Huang, Chao-Chin Wu, Lien-Fu Lai, and Yun-Ju Li Copyright © 2015 Liang-Tsung Huang et al. All rights reserved. Similarities in Gene Expression Profiles during In Vitro Aging of Primary Human Embryonic Lung and Foreskin Fibroblasts Mon, 03 Aug 2015 08:07:15 +0000 Replicative senescence is of fundamental importance for the process of cellular aging, since it is a property of most of our somatic cells. Here, we elucidated this process by comparing gene expression changes, measured by RNA-seq, in fibroblasts originating from two different tissues, embryonic lung (MRC-5) and foreskin (HFF), at five different time points during their transition into senescence. Although the expression patterns of both fibroblast cell lines can be clearly distinguished, the similar differential expression of an ensemble of genes was found to correlate well with their transition into senescence, with only a minority of genes being cell line specific. Clustering-based approaches further revealed common signatures between the cell lines. Investigation of the mRNA expression levels at various time points during the lifespan of either of the fibroblasts resulted in a number of monotonically up- and downregulated genes which clearly showed a novel strong link to aging and senescence related processes which might be functional. In terms of expression profiles of differentially expressed genes with age, common genes identified here have the potential to rule the transition into senescence of embryonic lung and foreskin fibroblasts irrespective of their different cellular origin. Shiva Marthandan, Steffen Priebe, Mario Baumgart, Marco Groth, Alessandro Cellerino, Reinhard Guthke, Peter Hemmerich, and Stephan Diekmann Copyright © 2015 Shiva Marthandan et al. All rights reserved. AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model Mon, 03 Aug 2015 07:44:11 +0000 Motivation. The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction. Method. We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels. Results. Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods. Jianzhu Ma and Sheng Wang Copyright © 2015 Jianzhu Ma and Sheng Wang. All rights reserved. Module Based Differential Coexpression Analysis Method for Type 2 Diabetes Mon, 03 Aug 2015 07:40:09 +0000 More and more studies have shown that many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional biological pathway or network and are highly correlated. Differential coexpression analysis, as a more comprehensive technique to the differential expression analysis, was raised to research gene regulatory networks and biological pathways of phenotypic changes through measuring gene correlation changes between disease and normal conditions. In this paper, we propose a gene differential coexpression analysis algorithm in the level of gene sets and apply the algorithm to a publicly available type 2 diabetes (T2D) expression dataset. Firstly, we calculate coexpression biweight midcorrelation coefficients between all gene pairs. Then, we select informative correlation pairs using the “differential coexpression threshold” strategy. Finally, we identify the differential coexpression gene modules using maximum clique concept and k-clique algorithm. We apply the proposed differential coexpression analysis method on simulated data and T2D data. Two differential coexpression gene modules about T2D were detected, which should be useful for exploring the biological function of the related genes. Lin Yuan, Chun-Hou Zheng, Jun-Feng Xia, and De-Shuang Huang Copyright © 2015 Lin Yuan et al. All rights reserved. Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection Mon, 03 Aug 2015 07:26:06 +0000 Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification. Yifei Chen, Yuxing Sun, and Bing-Qing Han Copyright © 2015 Yifei Chen et al. All rights reserved. Spatially Enhanced Differential RNA Methylation Analysis from Affinity-Based Sequencing Data with Hidden Markov Model Sun, 02 Aug 2015 14:09:30 +0000 With the development of new sequencing technology, the entire N6-methyl-adenosine (m6A) RNA methylome can now be unbiased profiled with methylated RNA immune-precipitation sequencing technique (MeRIP-Seq), making it possible to detect differential methylation states of RNA between two conditions, for example, between normal and cancerous tissue. However, as an affinity-based method, MeRIP-Seq has yet provided base-pair resolution; that is, a single methylation site determined from MeRIP-Seq data can in practice contain multiple RNA methylation residuals, some of which can be regulated by different enzymes and thus differentially methylated between two conditions. Since existing peak-based methods could not effectively differentiate multiple methylation residuals located within a single methylation site, we propose a hidden Markov model (HMM) based approach to address this issue. Specifically, the detected RNA methylation site is further divided into multiple adjacent small bins and then scanned with higher resolution using a hidden Markov model to model the dependency between spatially adjacent bins for improved accuracy. We tested the proposed algorithm on both simulated data and real data. Result suggests that the proposed algorithm clearly outperforms existing peak-based approach on simulated systems and detects differential methylation regions with higher statistical significance on real dataset. Yu-Chen Zhang, Shao-Wu Zhang, Lian Liu, Hui Liu, Lin Zhang, Xiaodong Cui, Yufei Huang, and Jia Meng Copyright © 2015 Yu-Chen Zhang et al. All rights reserved. Advanced Computational Approaches for Medical Genetics and Genomics Thu, 30 Jul 2015 06:34:50 +0000 Zhi Wei, Xiao Chang, and Junwen Wang Copyright © 2015 Zhi Wei et al. All rights reserved. Identification of Gene Biomarkers for Distinguishing Small-Cell Lung Cancer from Non-Small-Cell Lung Cancer Using a Network-Based Approach Tue, 28 Jul 2015 08:11:43 +0000 Lung cancer consists of two main subtypes: small-cell lung cancer (SCLC) and non-small-cell lung cancer (NSCLC) that are classified according to their physiological phenotypes. In this study, we have developed a network-based approach to identify molecular biomarkers that can distinguish SCLC from NSCLC. By identifying positive and negative coexpression gene pairs in normal lung tissues, SCLC, or NSCLC samples and using functional association information from the STRING network, we first construct a lung cancer-specific gene association network. From the network, we obtain gene modules in which genes are highly functionally associated with each other and are either positively or negatively coexpressed in the three conditions. Then, we identify gene modules that not only are differentially expressed between cancer and normal samples, but also show distinctive expression patterns between SCLC and NSCLC. Finally, we select genes inside those modules with discriminating coexpression patterns between the two lung cancer subtypes and predict them as candidate biomarkers that are of diagnostic use. Fei Long, Jia-Hang Su, Bin Liang, Li-Li Su, and Shu-Juan Jiang Copyright © 2015 Fei Long et al. All rights reserved. Network-Based Association Study of Obesity and Type 2 Diabetes with Gene Expression Profiles Mon, 27 Jul 2015 07:52:11 +0000 The increased prevalence of obesity and type 2 diabetes (T2D) has become an important factor affecting the health of the human. Obesity is commonly considered as a major risk factor for the development of T2D. However, the molecular mechanisms of the disease relations are not well discovered yet. In this study, the combination of multiple differential expression profiles and a comprehensive biological network of obesity and T2D allowed us to identify and compare the disease-responsive active modules and subclusters. The results demonstrated that the connection between obesity and T2D mainly relied on several pathways involved in the digestive metabolism, immunization, and signal transduction, such as adipocytokine, chemokine signaling pathway, T cell receptor signaling pathway, and MAPK signaling pathways. The relationships of almost all of these pathways with obesity and T2D have been verified by the previous reports individually. We also found that the different parts in the same pathway are activated in obesity and T2D. The association of cancer, obesity, and T2D was identified too here. As a conclusion, our network-based method not only gives better support for the close connection between obesity and T2D, but also provides a systemic view in understanding the molecular functions underneath the links. It should be helpful in the development of new therapies for obesity, T2D, and the associated diseases. Siyi Zhang, Bo Wang, Jingsong Shi, and Jing Li Copyright © 2015 Siyi Zhang et al. All rights reserved. Gene Coexpression and Evolutionary Conservation Analysis of the Human Preimplantation Embryos Mon, 27 Jul 2015 07:05:30 +0000 Evolutionary developmental biology (EVO-DEVO) tries to decode evolutionary constraints on the stages of embryonic development. Two models—the “funnel-like” model and the “hourglass” model—have been proposed by investigators to illustrate the fluctuation of selective pressure on these stages. However, selective indices of stages corresponding to mammalian preimplantation embryonic development (PED) were undetected in previous studies. Based on single cell RNA sequencing of stages during human PED, we used coexpression method to identify gene modules activated in each of these stages. Through measuring the evolutionary indices of gene modules belonging to each stage, we observed change pattern of selective constraints on PED for the first time. The selective pressure decreases from the zygote stage to the 4-cell stage and increases at the 8-cell stage and then decreases again from 8-cell stage to the late blastocyst stages. Previous EVO-DEVO studies concerning the whole embryo development neglected the fluctuation of selective pressure in these earlier stages, and the fluctuation was potentially correlated with events of earlier stages, such as zygote genome activation (ZGA). Such oscillation in an earlier stage would further affect models of the evolutionary constraints on whole embryo development. Therefore, these earlier stages should be measured intensively in future EVO-DEVO studies. Tiancheng Liu, Lin Yu, Guohui Ding, Zhen Wang, Lei Liu, Hong Li, and Yixue Li Copyright © 2015 Tiancheng Liu et al. All rights reserved. Statistical Genomic Approach Identifies Association between FSHR Polymorphisms and Polycystic Ovary Morphology in Women with Polycystic Ovary Syndrome Sun, 26 Jul 2015 12:40:17 +0000 Background. Single-nucleotide polymorphisms (SNPs) in the follicle stimulating hormone receptor (FSHR) gene are associated with PCOS. However, their relationship to the polycystic ovary (PCO) morphology remains unknown. This study aimed to investigate whether PCOS related SNPs in the FSHR gene are associated with PCO in women with PCOS. Methods. Patients were grouped into PCO () and non-PCO () groups. Genomic genotypes were profiled using Affymetrix human genome SNP chip 6. Two polymorphisms (rs2268361 and rs2349415) of FSHR were analyzed using a statistical approach. Results. Significant differences were found in the allele distributions of the GG genotype of rs2268361 between the PCO and non-PCO groups (27.6% GG, 53.4% GA, and 19.0% AA versus 33.3% GG, 36.5% GA, and 30.2% AA), while no significant differences were found in the allele distributions of the GG genotype of rs2349415. When rs2268361 was considered, there were statistically significant differences of serum follicle stimulating hormone, estradiol, and sex hormone binding globulin between genotypes in the PCO group. In case of the rs2349415 SNP, only serum sex hormone binding globulin was statistically different between genotypes in the PCO group. Conclusions. Functional variants in FSHR gene may contribute to PCO susceptibility in women with PCOS. Tao Du, Yu Duan, Kaiwen Li, Xiaomiao Zhao, Renmin Ni, Yu Li, and Dongzi Yang Copyright © 2015 Tao Du et al. All rights reserved. Deciphering the Correlation between Breast Tumor Samples and Cell Lines by Integrating Copy Number Changes and Gene Expression Profiles Sun, 26 Jul 2015 11:35:30 +0000 Breast cancer is one of the most common cancers with high incident rate and high mortality rate worldwide. Although different breast cancer cell lines were widely used in laboratory investigations, accumulated evidences have indicated that genomic differences exist between cancer cell lines and tissue samples in the past decades. The abundant molecular profiles of cancer cell lines and tumor samples deposited in the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas now allow a systematical comparison of the breast cancer cell lines with breast tumors. We depicted the genomic characteristics of breast primary tumors based on the copy number variation and gene expression profiles and the breast cancer cell lines were compared to different subgroups of breast tumors. We identified that some of the breast cancer cell lines show high correlation with the tumor group that agrees with previous knowledge, while a big part of them do not, including the most used MCF7, MDA-MB-231, and T-47D. We presented a computational framework to identify cell lines that mostly resemble a certain tumor group for the breast tumor study. Our investigation presents a useful guide to bridge the gap between cell lines and tumors and helps to select the most suitable cell line models for personalized cancer studies. Yi Sun and Qi Liu Copyright © 2015 Yi Sun and Qi Liu. All rights reserved. Network Comparison of Inflammation in Colorectal Cancer and Alzheimer’s Disease Sun, 26 Jul 2015 09:45:50 +0000 Recently, a large clinical study revealed an inverse correlation of individual risk of cancer versus Alzheimer’s disease (AD). However, no explanation exists for this anticorrelation at the molecular level; however, inflammation is crucial to the pathogenesis of both diseases, necessitating a need to understand differing signaling usage during inflammatory responses distinct to both diseases. Using a subpathway analysis approach, we identified numerous well-known and previously unknown pathways enriched in datasets from both diseases. Here, we present the quantitative importance of the inflammatory response in the two disease pathologies and summarize signal transduction pathways common to both diseases that are affected by inflammation. Sungjin Park, Seok Jong Yu, Yongseong Cho, Curt Balch, Jinhyuk Lee, Yon Hui Kim, and Seungyoon Nam Copyright © 2015 Sungjin Park et al. All rights reserved. The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics Sun, 26 Jul 2015 09:44:05 +0000 In recent years, there has been a considerable amount of research on the use of regularization methods for inference and prediction in quantitative genetics. Such research mostly focuses on selection of markers and shrinkage of their effects. In this review paper, the use of ridge regression for prediction in quantitative genetics using single-nucleotide polymorphism data is discussed. In particular, we consider (i) the theoretical foundations of ridge regression, (ii) its link to commonly used methods in animal breeding, (iii) the computational feasibility, and (iv) the scope for constructing prediction models with nonlinear effects (e.g., dominance and epistasis). Based on a simulation study we gauge the current and future potential of ridge regression for prediction of human traits using genome-wide SNP data. We conclude that, for outcomes with a relatively simple genetic architecture, given current sample sizes in most cohorts (i.e., ,000) the predictive accuracy of ridge regression is slightly higher than the classical genome-wide association study approach of repeated simple regression (i.e., one regression per SNP). However, both capture only a small proportion of the heritability. Nevertheless, we find evidence that for large-scale initiatives, such as biobanks, sample sizes can be achieved where ridge regression compared to the classical approach improves predictive accuracy substantially. Ronald de Vlaming and Patrick J. F. Groenen Copyright © 2015 Ronald de Vlaming and Patrick J. F. Groenen. All rights reserved. FARMS: A New Algorithm for Variable Selection Sun, 26 Jul 2015 07:39:02 +0000 Large datasets including an extensive number of covariates are generated these days in many different situations, for instance, in detailed genetic studies of outbreed human populations or in complex analyses of immune responses to different infections. Aiming at informing clinical interventions or vaccine design, methods for variable selection identifying those variables with the optimal prediction performance for a specific outcome are crucial. However, testing for all potential subsets of variables is not feasible and alternatives to existing methods are needed. Here, we describe a new method to handle such complex datasets, referred to as FARMS, that combines forward and all subsets regression for model selection. We apply FARMS to a host genetic and immunological dataset of over 800 individuals from Lima (Peru) and Durban (South Africa) who were HIV infected and tested for antiviral immune responses. This dataset includes more than 500 explanatory variables: around 400 variables with information on HIV immune reactivity and around 100 individual genetic characteristics. We have implemented FARMS in R statistical language and we showed that FARMS is fast and outcompetes other comparable commonly used approaches, thus providing a new tool for the thorough analysis of complex datasets without the need for massive computational infrastructure. Susana Perez-Alvarez, Guadalupe Gómez, and Christian Brander Copyright © 2015 Susana Perez-Alvarez et al. All rights reserved. Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods Sun, 26 Jul 2015 07:38:47 +0000 MicroRNAs constitute an important class of noncoding, single-stranded, ~22 nucleotide long RNA molecules encoded by endogenous genes. They play an important role in regulating gene transcription and the regulation of normal development. MicroRNAs can be associated with disease; however, only a few microRNA-disease associations have been confirmed by traditional experimental approaches. We introduce two methods to predict microRNA-disease association. The first method, KATZ, focuses on integrating the social network analysis method with machine learning and is based on networks derived from known microRNA-disease associations, disease-disease associations, and microRNA-microRNA associations. The other method, CATAPULT, is a supervised machine learning method. We applied the two methods to 242 known microRNA-disease associations and evaluated their performance using leave-one-out cross-validation and 3-fold cross-validation. Experiments proved that our methods outperformed the state-of-the-art methods. Quan Zou, Jinjin Li, Qingqi Hong, Ziyu Lin, Yun Wu, Hua Shi, and Ying Ju Copyright © 2015 Quan Zou et al. All rights reserved. Low-Rank and Sparse Matrix Decomposition for Genetic Interaction Data Sun, 26 Jul 2015 07:34:19 +0000 Background. Epistatic miniarray profile (EMAP) studies have enabled the mapping of large-scale genetic interaction networks and generated large amounts of data in model organisms. One approach to analyze EMAP data is to identify gene modules with densely interacting genes. In addition, genetic interaction score ( score) reflects the degree of synergizing or mitigating effect of two mutants, which is also informative. Statistical approaches that exploit both modularity and the pairwise interactions may provide more insight into the underlying biology. However, the high missing rate in EMAP data hinders the development of such approaches. To address the above problem, we adopted the matrix decomposition methodology “low-rank and sparse decomposition” (LRSDec) to decompose EMAP data matrix into low-rank part and sparse part. Results. LRSDec has been demonstrated as an effective technique for analyzing EMAP data. We applied a synthetic dataset and an EMAP dataset studying RNA-related processes in Saccharomyces cerevisiae. Global views of the genetic cross talk between different RNA-related protein complexes and processes have been structured, and novel functions of genes have been predicted. Yishu Wang, Dejie Yang, and Minghua Deng Copyright © 2015 Yishu Wang et al. All rights reserved. Biometrics and Biosecurity 2014 Tue, 21 Jul 2015 11:00:24 +0000 Tai-hoon Kim, Sabah Mohammed, Wai-Chi Fang, and Carlos Ramos Copyright © 2015 Tai-hoon Kim et al. All rights reserved. A Multilayer Secure Biomedical Data Management System for Remotely Managing a Very Large Number of Diverse Personal Healthcare Devices Mon, 13 Jul 2015 08:25:22 +0000 In this paper, a multilayer secure biomedical data management system for managing a very large number of diverse personal health devices is proposed. The system has the following characteristics: the system supports international standard communication protocols to achieve interoperability. The system is integrated in the sense that both a PHD communication system and a remote PHD management system work together as a single system. Finally, the system proposed in this paper provides user/message authentication processes to securely transmit biomedical data measured by PHDs based on the concept of a biomedical signature. Some experiments, including the stress test, have been conducted to show that the system proposed/constructed in this study performs very well even when a very large number of PHDs are used. For a stress test, up to 1,200 threads are made to represent the same number of PHD agents. The loss ratio of the ISO/IEEE 11073 messages in the normal system is as high as 14% when 1,200 PHD agents are connected. On the other hand, no message loss occurs in the multilayered system proposed in this study, which demonstrates the superiority of the multilayered system to the normal system with regard to heavy traffic. KeeHyun Park and SeungHyeon Lim Copyright © 2015 KeeHyun Park and SeungHyeon Lim. All rights reserved. Towards a Food Safety Knowledge Base Applicable in Crisis Situations and Beyond Mon, 13 Jul 2015 08:23:27 +0000 In case of contamination in the food chain, fast action is required in order to reduce the numbers of affected people. In such situations, being able to predict the fate of agents in foods would help risk assessors and decision makers in assessing the potential effects of a specific contamination event and thus enable them to deduce the appropriate mitigation measures. One efficient strategy supporting this is using model based simulations. However, application in crisis situations requires ready-to-use and easy-to-adapt models to be available from the so-called food safety knowledge bases. Here, we illustrate this concept and its benefits by applying the modular open source software tools PMM-Lab and FoodProcess-Lab. As a fictitious sample scenario, an intentional ricin contamination at a beef salami production facility was modelled. Predictive models describing the inactivation of ricin were reviewed, relevant models were implemented with PMM-Lab, and simulations on residual toxin amounts in the final product were performed with FoodProcess-Lab. Due to the generic and modular modelling concept implemented in these tools, they can be applied to simulate virtually any food safety contamination scenario. Apart from the application in crisis situations, the food safety knowledge base concept will also be useful in food quality and safety investigations. Alexander Falenski, Armin A. Weiser, Christian Thöns, Bernd Appel, Annemarie Käsbohrer, and Matthias Filter Copyright © 2015 Alexander Falenski et al. All rights reserved. A Multimodal User Authentication System Using Faces and Gestures Mon, 13 Jul 2015 07:58:22 +0000 As a novel approach to perform user authentication, we propose a multimodal biometric system that uses faces and gestures obtained from a single vision sensor. Unlike typical multimodal biometric systems using physical information, the proposed system utilizes gesture video signals combined with facial images. Whereas physical information such as face, fingerprints, and iris is fixed and not changeable, behavioral information such as gestures and signatures can be freely changed by the user, similar to a password. Therefore, it can be a countermeasure when the physical information is exposed. We aim to investigate the potential possibility of using gestures as a signal for biometric system and the robustness of the proposed multimodal user authentication system. Through computational experiments on a public database, we confirm that gesture information can help to improve the authentication performance. Hyunsoek Choi and Hyeyoung Park Copyright © 2015 Hyunsoek Choi and Hyeyoung Park. All rights reserved. Quantification of Hepatorenal Index for Computer-Aided Fatty Liver Classification with Self-Organizing Map and Fuzzy Stretching from Ultrasonography Mon, 13 Jul 2015 07:57:28 +0000 Accurate measures of liver fat content are essential for investigating hepatic steatosis. For a noninvasive inexpensive ultrasonographic analysis, it is necessary to validate the quantitative assessment of liver fat content so that fully automated reliable computer-aided software can assist medical practitioners without any operator subjectivity. In this study, we attempt to quantify the hepatorenal index difference between the liver and the kidney with respect to the multiple severity status of hepatic steatosis. In order to do this, a series of carefully designed image processing techniques, including fuzzy stretching and edge tracking, are applied to extract regions of interest. Then, an unsupervised neural learning algorithm, the self-organizing map, is designed to establish characteristic clusters from the image, and the distribution of the hepatorenal index values with respect to the different levels of the fatty liver status is experimentally verified to estimate the differences in the distribution of the hepatorenal index. Such findings will be useful in building reliable computer-aided diagnostic software if combined with a good set of other characteristic feature sets and powerful machine learning classifiers in the future. Kwang Baek Kim and Chang Won Kim Copyright © 2015 Kwang Baek Kim and Chang Won Kim. All rights reserved. Biometrics Analysis and Evaluation on Korean Makgeolli Using Brainwaves and Taste Biological Sensor System Mon, 13 Jul 2015 07:53:08 +0000 There are several methods available in measuring food taste. The sensory evaluation, for instance, is a typical method for panels to test of taste and recognize smell with their nose by measuring the degree of taste characteristic, intensity, and pleasure. There are many issues entailed in the traditional sensory evaluation method such as forming a panel and evaluation cost; moreover, it is only localized in particular areas. Accordingly, this paper aimed to select food in one particular area, and compare and review the content between sensory evaluations using a taste biological sensor, as well as presenting an analysis of brainwaves using EEG and finally a proposal of a new method for sensory evaluation. In this paper, the researchers have conducted a sensory evaluation whereas a maximum of nine points were accumulated by purchasing eight types of rice wine. These eight types of Makgeolli were generalized by generating multidimensional data with the use of TS-5000z, thus learning mapping points and scaling them. The contribution of this paper, therefore, is to overcome the disadvantages of the sensory evaluation with the usage of the suggested taste biological sensor system. Yong-Sung Kim and Yong-Suk Kim Copyright © 2015 Yong-Sung Kim and Yong-Suk Kim. All rights reserved. Bilateral Image Subtraction and Multivariate Models for the Automated Triaging of Screening Mammograms Thu, 09 Jul 2015 11:35:14 +0000 Mammography is the most common and effective breast cancer screening test. However, the rate of positive findings is very low, making the radiologic interpretation monotonous and biased toward errors. This work presents a computer-aided diagnosis (CADx) method aimed to automatically triage mammogram sets. The method coregisters the left and right mammograms, extracts image features, and classifies the subjects into risk of having malignant calcifications (CS), malignant masses (MS), and healthy subject (HS). In this study, 449 subjects (197 CS, 207 MS, and 45 HS) from a public database were used to train and evaluate the CADx. Percentile-rank (-rank) and -normalizations were used. For the -rank, the CS versus HS model achieved a cross-validation accuracy of 0.797 with an area under the receiver operating characteristic curve (AUC) of 0.882; the MS versus HS model obtained an accuracy of 0.772 and an AUC of 0.842. For the -normalization, the CS versus HS model achieved an accuracy of 0.825 with an AUC of 0.882 and the MS versus HS model obtained an accuracy of 0.698 and an AUC of 0.807. The proposed method has the potential to rank cases with high probability of malignant findings aiding in the prioritization of radiologists work list. José Celaya-Padilla, Antonio Martinez-Torteya, Juan Rodriguez-Rojas, Jorge Galvan-Tejada, Victor Treviño, and José Tamez-Peña Copyright © 2015 José Celaya-Padilla et al. All rights reserved. Functional and Structural Consequences of Damaging Single Nucleotide Polymorphisms in Human Prostate Cancer Predisposition Gene RNASEL Wed, 08 Jul 2015 09:10:00 +0000 A commonly diagnosed cancer, prostate cancer (PrCa), is being regulated by the gene RNASEL previously known as PRCA1 codes for ribonuclease L which is an integral part of interferon regulated system that mediates antiviral and antiproliferative role of the interferons. Both somatic and germline mutations have been implicated to cause prostate cancer. With an array of available Single Nucleotide Polymorphism data on dbSNP this study is designed to sort out functional SNPs in RNASEL by implementing different authentic computational tools such as SIFT, PolyPhen, SNPs&GO, Fathmm, ConSurf, UTRScan, PDBsum, Tm-Align, I-Mutant, and Project HOPE for functional and structural assessment, solvent accessibility, molecular dynamics, and energy minimization study. Among 794 RNASEL SNP entries 124 SNPs were found nonsynonymous from which SIFT predicted 13 nsSNPs as nontolerable whereas PolyPhen-2 predicted 28. SNPs found on the 3′ and 5′ UTR were also assessed. By analyzing six tools having different perspectives an aggregate result was produced where nine nsSNPs were found to be most likely to exert deleterious effect. 3D models of mutated proteins were generated to determine the functional and structural effect of the mutations on ribonuclease L. The initial findings were reinforced by the results from I-Mutant and Project HOPE as these tools predicted significant structural and functional instability of the mutated proteins. Expasy-ProSit tool defined the mutations to be situated in the functional domains of the protein. Considering previous analysis this study revealed a conclusive result deducing the available SNP data on the database by identifying the most damaging three nsSNP rs151296858 (G59S), rs145415894 (A276V), and rs35896902 (R592H). As such studies involving polymorphisms of RNASEL were none to be found, the results of the current study would certainly be helpful in future prospects concerning prostate cancer in males. Amit Datta, Md. Habibul Hasan Mazumder, Afrin Sultana Chowdhury, and Md. Anayet Hasan Copyright © 2015 Amit Datta et al. All rights reserved. A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets Sun, 05 Jul 2015 07:40:37 +0000 New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME. Yipu Zhang and Ping Wang Copyright © 2015 Yipu Zhang and Ping Wang. All rights reserved. Big Data Analytics in Healthcare Thu, 02 Jul 2015 12:06:04 +0000 The rapidly expanding field of big data analytics has started to play a pivotal role in the evolution of healthcare practices and research. It has provided tools to accumulate, manage, analyze, and assimilate large volumes of disparate, structured, and unstructured data produced by current healthcare systems. Big data analytics has been recently applied towards aiding the process of care delivery and disease exploration. However, the adoption rate and research development in this space is still hindered by some fundamental problems inherent within the big data paradigm. In this paper, we discuss some of these major challenges with a focus on three upcoming and promising areas of medical research: image, signal, and genomics based analytics. Recent research which targets utilization of large volumes of medical data while combining multimodal data from disparate sources is discussed. Potential areas of research within this field which have the ability to provide meaningful impact on healthcare delivery are also examined. Ashwin Belle, Raghuram Thiagarajan, S. M. Reza Soroushmehr, Fatemeh Navidi, Daniel A. Beard, and Kayvan Najarian Copyright © 2015 Ashwin Belle et al. All rights reserved. GESearch: An Interactive GUI Tool for Identifying Gene Expression Signature Thu, 25 Jun 2015 06:28:00 +0000 The huge amount of gene expression data generated by microarray and next-generation sequencing technologies present challenges to exploit their biological meanings. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Overall, we have developed an interactive GUI toolkit that allows for choosing multiple algorithms for analyzing the gene expression signatures. Ning Ye, Hengfu Yin, Jingjing Liu, Xiaogang Dai, and Tongming Yin Copyright © 2015 Ning Ye et al. All rights reserved. Combined Analysis of SNP Array Data Identifies Novel CNV Candidates and Pathways in Ependymoma and Mesothelioma Mon, 22 Jun 2015 06:06:41 +0000 Copy number variation is a class of structural genomic modifications that includes the gain and loss of a specific genomic region, which may include an entire gene. Many studies have used low-resolution techniques to identify regions that are frequently lost or amplified in cancer. Usually, researchers choose to use proprietary or non-open-source software to detect these regions because the graphical interface tends to be easier to use. In this study, we combined two different open-source packages into an innovative strategy to identify novel copy number variations and pathways associated with cancer. We used a mesothelioma and ependymoma published datasets to assess our tool. We detected previously described and novel copy number variations that are associated with cancer chemotherapy resistance. We also identified altered pathways associated with these diseases, like cell adhesion in patients with mesothelioma and negative regulation of glutamatergic synaptic transmission in ependymoma patients. In conclusion, we present a novel strategy using open-source software to identify copy number variations and altered pathways associated with cancer. Gabriel Wajnberg, Benilton S. Carvalho, Carlos G. Ferreira, and Fabio Passetti Copyright © 2015 Gabriel Wajnberg et al. All rights reserved. Network-Based Logistic Classification with an Enhanced Solver Reveals Biomarker and Subnetwork Signatures for Diagnosing Lung Cancer Tue, 16 Jun 2015 08:08:23 +0000 Identifying biomarker and signaling pathway is a critical step in genomic studies, in which the regularization method is a widely used feature extraction approach. However, most of the regularizers are based on -norm and their results are not good enough for sparsity and interpretation and are asymptotically biased, especially in genomic research. Recently, we gained a large amount of molecular interaction information about the disease-related biological processes and gathered them through various databases, which focused on many aspects of biological systems. In this paper, we use an enhanced penalized solver to penalize network-constrained logistic regression model called an enhanced net, where the predictors are based on gene-expression data with biologic network knowledge. Extensive simulation studies showed that our proposed approach outperforms regularization, the old penalized solver, and the Elastic net approaches in terms of classification accuracy and stability. Furthermore, we applied our method for lung cancer data analysis and found that our method achieves higher predictive accuracy than regularization, the old penalized solver, and the Elastic net approaches, while fewer but informative biomarkers and pathways are selected. Hai-Hui Huang, Yong Liang, and Xiao-Ying Liu Copyright © 2015 Hai-Hui Huang et al. All rights reserved. The Impact of Normalization Methods on RNA-Seq Data Analysis Mon, 15 Jun 2015 12:14:24 +0000 High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. J. Zyprych-Walczak, A. Szabelska, L. Handschuh, K. Górczak, K. Klamecka, M. Figlerowicz, and I. Siatkowski Copyright © 2015 J. Zyprych-Walczak et al. All rights reserved. A Robust Supervised Variable Selection for Noisy High-Dimensional Data Tue, 02 Jun 2015 06:53:56 +0000 The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers. Jan Kalina and Anna Schlenker Copyright © 2015 Jan Kalina and Anna Schlenker. All rights reserved. Toward a Literature-Driven Definition of Big Data in Healthcare Tue, 02 Jun 2015 06:08:12 +0000 Objective. The aim of this study was to provide a definition of big data in healthcare. Methods. A systematic search of PubMed literature published until May 9, 2014, was conducted. We noted the number of statistical individuals and the number of variables for all papers describing a dataset. These papers were classified into fields of study. Characteristics attributed to big data by authors were also considered. Based on this analysis, a definition of big data was proposed. Results. A total of 196 papers were included. Big data can be defined as datasets with . Properties of big data are its great variety and high velocity. Big data raises challenges on veracity, on all aspects of the workflow, on extracting meaningful information, and on sharing information. Big data requires new computational methods that optimize data management. Related concepts are data reuse, false knowledge discovery, and privacy issues. Conclusion. Big data is defined by volume. Big data should not be confused with data reuse: data can be big without being reused for another purpose, for example, in omics. Inversely, data can be reused without being necessarily big, for example, secondary use of Electronic Medical Records (EMR) data. Emilie Baro, Samuel Degoul, Régis Beuscart, and Emmanuel Chazard Copyright © 2015 Emilie Baro et al. All rights reserved. Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies Mon, 01 Jun 2015 14:35:07 +0000 Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries. Alexandre G. de Brevern, Jean-Philippe Meyniel, Cécile Fairhead, Cécile Neuvéglise, and Alain Malpertuy Copyright © 2015 Alexandre G. de Brevern et al. All rights reserved. Improved Diagnostic Multimodal Biomarkers for Alzheimer’s Disease and Mild Cognitive Impairment Thu, 28 May 2015 06:17:43 +0000 The early diagnosis of Alzheimer’s disease (AD) and mild cognitive impairment (MCI) is very important for treatment research and patient care purposes. Few biomarkers are currently considered in clinical settings, and their use is still optional. The objective of this work was to determine whether multimodal and nonpreviously AD associated features could improve the classification accuracy between AD, MCI, and healthy controls, which may impact future AD biomarkers. For this, Alzheimer’s Disease Neuroimaging Initiative database was mined for case-control candidates. At least 652 baseline features extracted from MRI and PET analyses, biological samples, and clinical data up to February 2014 were used. A feature selection methodology that includes a genetic algorithm search coupled to a logistic regression classifier and forward and backward selection strategies was used to explore combinations of features. This generated diagnostic models with sizes ranging from 3 to 8, including well documented AD biomarkers, as well as unexplored image, biochemical, and clinical features. Accuracies of 0.85, 0.79, and 0.80 were achieved for HC-AD, HC-MCI, and MCI-AD classifications, respectively, when evaluated using a blind test set. In conclusion, a set of features provided additional and independent information to well-established AD biomarkers, aiding in the classification of MCI and AD. Antonio Martínez-Torteya, Víctor Treviño, and José G. Tamez-Peña Copyright © 2015 Antonio Martínez-Torteya et al. All rights reserved. Quantitative Assessment of the Association between Genetic Variants in MicroRNAs and Colorectal Cancer Risk Wed, 20 May 2015 09:31:47 +0000 Background. The associations between polymorphisms in microRNAs and the susceptibility of colorectal cancer (CRC) were inconsistent in previous studies. This study aims to quantify the strength of the correlation between the four common polymorphisms among microRNAs (hsa-mir-146a rs2910164, hsa-mir-149 rs2292832, hsa-mir-196a2 rs11614913, and hsa-mir-499 rs3746444) and CRC risk. Methods. We searched PubMed, Web of Knowledge, and CNKI to find relevant studies. The combined odds ratio (OR) with 95% confidence interval (95% CI) was used to estimate the strength of the association in a fixed or random effect model. Results. 15 studies involving 5,486 CRC patients and 7,184 controls were included. Meta-analyses showed that rs3746444 had association with CRC risk in Caucasians (OR = 0.57, 95% CI = 0.34–0.95). In the subgroup analysis, we found significant associations between rs2910164 and CRC in hospital based studies (OR = 1.24, 95% CI = 1.03–1.49). rs2292832 may be a high risk factor of CRC in population based studied (OR = 1.18, 95% CI = 1.08–1.38). Conclusion. This meta-analysis showed that rs2910164 and rs2292832 may increase the risk of CRC. However, rs11614913 polymorphism may reduce the risk of CRC. rs3746444 may have a decreased risk to CRC in Caucasians. Xiao-Xu Liu, Meng Wang, Dan Xu, Jian-Hai Yang, Hua-Feng Kang, Xi-Jing Wang, Shuai Lin, Peng-Tao Yang, Xing-Han Liu, and Zhi-Jun Dai Copyright © 2015 Xiao-Xu Liu et al. All rights reserved. Multiblock Discriminant Analysis for Integrative Genomic Study Sun, 17 May 2015 11:45:02 +0000 Human diseases are abnormal medical conditions in which multiple biological components are complicatedly involved. Nevertheless, most contributions of research have been made with a single type of genetic data such as Single Nucleotide Polymorphism (SNP) or Copy Number Variation (CNV). Furthermore, epigenetic modifications and transcriptional regulations have to be considered to fully exploit the knowledge of the complex human diseases as well as the genomic variants. We call the collection of the multiple heterogeneous data “multiblock data.” In this paper, we propose a novel Multiblock Discriminant Analysis (MultiDA) method that provides a new integrative genomic model for the multiblock analysis and an efficient algorithm for discriminant analysis. The integrative genomic model is built by exploiting the representative genomic data including SNP, CNV, DNA methylation, and gene expression. The efficient algorithm for the discriminant analysis identifies discriminative factors of the multiblock data. The discriminant analysis is essential to discover biomarkers in computational biology. The performance of the proposed MultiDA was assessed by intensive simulation experiments, where the outstanding performance comparing the related methods was reported. As a target application, we applied MultiDA to human brain data of psychiatric disorders. The findings and gene regulatory network derived from the experiment are discussed. Mingon Kang, Dong-Chul Kim, Chunyu Liu, and Jean Gao Copyright © 2015 Mingon Kang et al. All rights reserved. Intelligent Informatics in Translational Medicine Wed, 06 May 2015 08:30:36 +0000 Hao-Teng Chang, Tatsuya Akutsu, Sorin Draghici, Oliver Ray, and Tun-Wen Pai Copyright © 2015 Hao-Teng Chang et al. All rights reserved. Implication of Caspase-3 as a Common Therapeutic Target for Multineurodegenerative Disorders and Its Inhibition Using Nonpeptidyl Natural Compounds Mon, 04 May 2015 13:43:02 +0000 Caspase-3 has been identified as a key mediator of neuronal apoptosis. The present study identifies caspase-3 as a common player involved in the regulation of multineurodegenerative disorders, namely, Alzheimer’s disease (AD), Parkinson’s disease (PD), Huntington’s disease (HD), and amyotrophic lateral sclerosis (ALS). The protein interaction network prepared using STRING database provides a strong evidence of caspase-3 interactions with the metabolic cascade of the said multineurodegenerative disorders, thus characterizing it as a potential therapeutic target for multiple neurodegenerative disorders. In silico molecular docking of selected nonpeptidyl natural compounds against caspase-3 exposed potent leads against this common therapeutic target. Rosmarinic acid and curcumin proved to be the most promising ligands (leads) mimicking the inhibitory action of peptidyl inhibitors with the highest Gold fitness scores 57.38 and 53.51, respectively. These results were in close agreement with the fitness score predicted using X-score, a consensus based scoring function to calculate the binding affinity. Nonpeptidyl inhibitors of caspase-3 identified in the present study expeditiously mimic the inhibitory action of the previously identified peptidyl inhibitors. Since, nonpeptidyl inhibitors are preferred drug candidates, hence, discovery of natural compounds as nonpeptidyl inhibitors is a significant transition towards feasible drug development for neurodegenerative disorders. Saif Khan, Khurshid Ahmad, Eyad M. A. Alshammari, Mohd Adnan, Mohd Hassan Baig, Mohtashim Lohani, Pallavi Somvanshi, and Shafiul Haque Copyright © 2015 Saif Khan et al. All rights reserved. The TF-miRNA Coregulation Network in Oral Lichen Planus Sun, 03 May 2015 12:33:38 +0000 Oral lichen planus (OLP) is a chronic inflammatory disease that affects oral mucosa, some of which may finally develop into oral squamous cell carcinoma. Therefore, pinpointing the molecular mechanisms underlying the pathogenesis of OLP is important to develop efficient treatments for OLP. Recently, the accumulation of the large amount of omics data, especially transcriptome data, provides opportunities to investigate OLPs from a systematic perspective. In this paper, assuming that the OLP associated genes have functional relationships, we present a new approach to identify OLP related gene modules from gene regulatory networks. In particular, we find that the gene modules regulated by both transcription factors (TFs) and microRNAs (miRNAs) play important roles in the pathogenesis of OLP and many genes in the modules have been reported to be related to OLP in the literature. Yu-Ling Zuo, Di-Ping Gong, Bi-Ze Li, Juan Zhao, Ling-Yue Zhou, Fang-Yang Shao, Zhao Jin, and Yuan He Copyright © 2015 Yu-Ling Zuo et al. All rights reserved. Prediction of Metabolic Gene Biomarkers for Neurodegenerative Disease by an Integrated Network-Based Approach Sun, 03 May 2015 11:35:44 +0000 Neurodegenerative diseases (NDs), such as Parkinson’s disease (PD) and Huntington’s disease (HD), have become more and more common among aged people worldwide. One hallmark of NDs is the presence of intracellular accumulation of specific pathogenic proteins that may result from abnormal function of metabolic processes. Previously, we have developed a computational method named Met-express that predicted key enzyme-coding genes in cancer development by integrating cancer gene coexpression network with the metabolic network. Here, we applied Met-express to predict key enzyme-coding genes in both PD and HD. Functional enrichment analysis and literature review of predicted genes suggested that there might be some common pathogenic metabolic pathways for PD and HD. We further found that the predicted genes had significant functional association with known disease genes, with some of them already documented as biomarkers or therapeutic targets for NDs. As such, the predicted metabolic genes may be of use as novel biomarkers not only for ND diagnosis but also for potential therapeutic treatments. Qi Ni, Xianming Su, Jingqi Chen, and Weidong Tian Copyright © 2015 Qi Ni et al. All rights reserved. Identification of Gene and MicroRNA Signatures for Oral Cancer Developed from Oral Leukoplakia Sun, 03 May 2015 11:12:40 +0000 In clinic, oral leukoplakia (OLK) may develop into oral cancer. However, the mechanism underlying this transformation is still unclear. In this work, we present a new pipeline to identify oral cancer related genes and microRNAs (miRNAs) by integrating both gene and miRNA expression profiles. In particular, we find some network modules as well as their miRNA regulators that play important roles in the development of OLK to oral cancer. Among these network modules, 91.67% of genes and 37.5% of miRNAs have been previously reported to be related to oral cancer in literature. The promising results demonstrate the effectiveness and efficiency of our proposed approach. Guanghui Zhu, Yuan He, Shaofang Yang, Beimin Chen, Min Zhou, and Xin-Jian Xu Copyright © 2015 Guanghui Zhu et al. All rights reserved. A Heparan Sulfate-Binding Cell Penetrating Peptide for Tumor Targeting and Migration Inhibition Sun, 03 May 2015 09:21:33 +0000 As heparan sulfate proteoglycans (HSPGs) are known as co-receptors to interact with numerous growth factors and then modulate downstream biological activities, overexpression of HS/HSPG on cell surface acts as an increasingly reliable prognostic factor in tumor progression. Cell penetrating peptides (CPPs) are short-chain peptides developed as functionalized vectors for delivery approaches of impermeable agents. On cell surface negatively charged HS provides the initial attachment of basic CPPs by electrostatic interaction, leading to multiple cellular effects. Here a functional peptide (CPPecp) has been identified from critical HS binding region in hRNase3, a unique RNase family member with in vitro antitumor activity. In this study we analyze a set of HS-binding CPPs derived from natural proteins including CPPecp. In addition to cellular binding and internalization, CPPecp demonstrated multiple functions including strong binding activity to tumor cell surface with higher HS expression, significant inhibitory effects on cancer cell migration, and suppression of angiogenesis in vitro and in vivo. Moreover, different from conventional highly basic CPPs, CPPecp facilitated magnetic nanoparticle to selectively target tumor site in vivo. Therefore, CPPecp could engage its capacity to be developed as biomaterials for diagnostic imaging agent, therapeutic supplement, or functionalized vector for drug delivery. Chien-Jung Chen, Kang-Chiao Tsai, Ping-Hsueh Kuo, Pei-Lin Chang, Wen-Ching Wang, Yung-Jen Chuang, and Margaret Dah-Tsyr Chang Copyright © 2015 Chien-Jung Chen et al. All rights reserved. A Survey on the Computational Approaches to Identify Drug Targets in the Postgenomic Era Tue, 28 Apr 2015 07:02:35 +0000 Identifying drug targets plays essential roles in designing new drugs and combating diseases. Unfortunately, our current knowledge about drug targets is far from comprehensive. Screening drug targets in the lab is an expensive and time-consuming procedure. In the past decade, the accumulation of various types of omics data makes it possible to develop computational approaches to predict drug targets. In this paper, we make a survey on the recent progress being made on computational methodologies that have been developed to predict drug targets based on different kinds of omics data and drug property data. Yan-Fen Dai and Xing-Ming Zhao Copyright © 2015 Yan-Fen Dai and Xing-Ming Zhao. All rights reserved. A Large-Scale Structural Classification of Antimicrobial Peptides Mon, 27 Apr 2015 12:43:57 +0000 Antimicrobial peptides (AMPs) are potent drug candidates against microbial organisms such as bacteria, fungi, parasites, and viruses. AMPs have abundant sequences and structures, two fundamental resources for bioinformatics researches, but analyses on how they associate with each other are either nonexistent or limited to partial classification and data. We thus present A Database of Anti-Microbial peptides (ADAM), which contains 7,007 unique sequences and 759 structures, to systematically establish comprehensive associations between AMP sequences and structures through structural folds and to provide an easy access to view their relationships. 30 distinct AMP structural fold clusters with more than one structure are detected and about a thousand AMPs are associated with at least one structural fold cluster. According to ADAM, AMP structural folds are limited—AMPs only cover about 3% of the overall protein fold space. Hao-Ting Lee, Chen-Che Lee, Je-Ruei Yang, Jim Z. C. Lai, and Kuan Y. Chang Copyright © 2015 Hao-Ting Lee et al. All rights reserved. Predicting Flavin and Nicotinamide Adenine Dinucleotide-Binding Sites in Proteins Using the Fragment Transformation Method Mon, 27 Apr 2015 11:48:34 +0000 We developed a computational method to identify NAD- and FAD-binding sites in proteins. First, we extracted from the Protein Data Bank structures of proteins that bind to at least one of these ligands. NAD-/FAD-binding residue templates were then constructed by identifying binding residues through the ligand-binding database BioLiP. The fragment transformation method was used to identify structures within query proteins that resembled the ligand-binding templates. By comparing residue types and their relative spatial positions, potential binding sites were identified and a ligand-binding potential for each residue was calculated. Setting the false positive rate at 5%, our method predicted NAD- and FAD-binding sites at true positive rates of 67.1% and 68.4%, respectively. Our method provides excellent results for identifying FAD- and NAD-binding sites in proteins, and the most important is that the requirement of conservation of residue types and local structures in the FAD- and NAD-binding sites can be verified. Chih-Hao Lu, Chin-Sheng Yu, Yu-Feng Lin, and Jin-Yi Chen Copyright © 2015 Chih-Hao Lu et al. All rights reserved. Functional Genomics, Genetics, and Bioinformatics Wed, 22 Apr 2015 06:20:06 +0000 Youping Deng, Hongwei Wang, Ryuji Hamamoto, David Schaffer, and Shiwei Duan Copyright © 2015 Youping Deng et al. All rights reserved. Integrated Analysis of Multiscale Large-Scale Biological Data for Investigating Human Disease Mon, 20 Apr 2015 09:13:13 +0000 Tao Huang, Lei Chen, Mingyue Zheng, and Jiangning Song Copyright © 2015 Tao Huang et al. All rights reserved. Application of Systems Biology and Bioinformatics Methods in Biochemistry and Biomedicine 2014 Sun, 19 Apr 2015 11:37:29 +0000 Yudong Cai, Tao Huang, Lei Chen, and Bing Niu Copyright © 2015 Yudong Cai et al. All rights reserved. A Practical and Scalable Tool to Find Overlaps between Sequences Sun, 19 Apr 2015 10:48:30 +0000 The evolution of the next generation sequencing technology increases the demand for efficient solutions, in terms of space and time, for several bioinformatics problems. This paper presents a practical and easy-to-implement solution for one of these problems, namely, the all-pairs suffix-prefix problem, using a compact prefix tree. The paper demonstrates an efficient construction of this time-efficient and space-economical tree data structure. The paper presents techniques for parallel implementations of the proposed solution. Experimental evaluation indicates superior results in terms of space and time over existing solutions. Results also show that the proposed technique is highly scalable in a parallel execution environment. Maan Haj Rachid and Qutaibah Malluhi Copyright © 2015 Maan Haj Rachid and Qutaibah Malluhi. All rights reserved. High Order Gene-Gene Interactions in Eight Single Nucleotide Polymorphisms of Renin-Angiotensin System Genes for Hypertension Association Study Sun, 19 Apr 2015 09:58:13 +0000 Several single nucleotide polymorphisms (SNPs) of renin-angiotensin system (RAS) genes are associated with hypertension (HT) but most of them are focusing on single locus effects. Here, we introduce an unbalanced function based on multifactor dimensionality reduction (MDR) for multiloci genotypes to detect high order gene-gene (SNP-SNP) interaction in unbalanced cases and controls of HT data. Eight SNPs of three RAS genes (angiotensinogen, AGT; angiotensin-converting enzyme, ACE; angiotensin II type 1 receptor, AT1R) in HT and non-HT subjects were included that showed no significant genotype differences. In 2- to 6-locus models of the SNP-SNP interaction, the SNPs of AGT and ACE genes were associated with hypertension (bootstrapping odds ratio [Boot-OR] = 1.972~3.785; 95%, confidence interval (CI) 1.26~6.21; ). In 7- and 8-locus model, SNP A1166C of AT1R gene is joined to improve the maximum Boot-OR values of 4.050 to 4.483; CI = 2.49 to 7.29; . In conclusion, the epistasis networks are identified by eight SNP-SNP interaction models. AGT, ACE, and AT1R genes have overall effects with susceptibility to hypertension, where the SNPs of ACE have a mainly hypertension-associated effect and show an interacting effect to SNPs of AGT and AT1R genes. Cheng-Hong Yang, Yu-Da Lin, Shyh-Jong Wu, Li-Yeh Chuang, and Hsueh-Wei Chang Copyright © 2015 Cheng-Hong Yang et al. All rights reserved. Effect of Electrode Shape on Impedance of Single HeLa Cell: A COMSOL Simulation Thu, 16 Apr 2015 08:39:21 +0000 In disease prophylaxis, single cell inspection provides more detailed data compared to conventional examinations. At the individual cell level, the electrical properties of the cell are helpful for understanding the effects of cellular behavior. The electric field distribution affects the results of single cell impedance measurements whereas the electrode geometry affects the electric field distributions. Therefore, this study obtained numerical solutions by using the COMSOL multiphysics package to perform FEM simulations of the effects of electrode geometry on microfluidic devices. An equivalent circuit model incorporating the PBS solution, a pair of electrodes, and a cell is used to obtain the impedance of a single HeLa cell. Simulations indicated that the circle and parallel electrodes provide higher electric field strength compared to cross and standard electrodes at the same operating voltage. Additionally, increasing the operating voltage reduces the impedance magnitude of a single HeLa cell in all electrode shapes. Decreasing impedance magnitude of the single HeLa cell increases measurement sensitivity, but higher operational voltage will damage single HeLa cell. Min-Haw Wang and Wen-Hao Chang Copyright © 2015 Min-Haw Wang and Wen-Hao Chang. All rights reserved. Evolutionary Pattern and Regulation Analysis to Support Why Diversity Functions Existed within PPAR Gene Family Members Wed, 15 Apr 2015 14:21:56 +0000 Peroxisome proliferators-activated receptor (PPAR) gene family members exhibit distinct patterns of distribution in tissues and differ in functions. The purpose of this study is to investigate the evolutionary impacts on diversity functions of PPAR members and the regulatory differences on gene expression patterns. 63 homology sequences of PPAR genes from 31 species were collected and analyzed. The results showed that three isolated types of PPAR gene family may emerge from twice times of gene duplication events. The conserved domains of HOLI (ligand binding domain of hormone receptors) domain and ZnF_C4 (C4 zinc finger in nuclear in hormone receptors) are essential for keeping basic roles of PPAR gene family, and the variant domains of LCRs may be responsible for their divergence in functions. The positive selection sites in HOLI domain are benefit for PPARs to evolve towards diversity functions. The evolutionary variants in the promoter regions and 3′ UTR regions of PPARs result into differential transcription factors and miRNAs involved in regulating PPAR members, which may eventually affect their expressions and tissues distributions. These results indicate that gene duplication event, selection pressure on HOLI domain, and the variants on promoter and 3′ UTR are essential for PPARs evolution and diversity functions acquired. Tianyu Zhou, Xiping Yan, Guosong Wang, Hehe Liu, Xiang Gan, Tao Zhang, Jiwen Wang, and Liang Li Copyright © 2015 Tianyu Zhou et al. All rights reserved. Relationship between Hyperuricemia and Haar-Like Features on Tongue Images Wed, 15 Apr 2015 13:36:54 +0000 Objective. To investigate differences in tongue images of subjects with and without hyperuricemia. Materials and Methods. This population-based case-control study was performed in 2012-2013. We collected data from 46 case subjects with hyperuricemia and 46 control subjects, including results of biochemical examinations and tongue images. Symmetrical Haar-like features based on integral images were extracted from tongue images. T-tests were performed to determine the ability of extracted features to distinguish between the case and control groups. We first selected features using the common criterion , then conducted further examination of feature characteristics and feature selection using means and standard deviations of distributions in the case and control groups. Results. A total of 115,683 features were selected using the criterion . The maximum area under the receiver operating characteristic curve (AUC) of these features was 0.877. The sensitivity of the feature with the maximum AUC value was 0.800 and specificity was 0.826 when the Youden index was maximized. Features that performed well were concentrated in the tongue root region. Conclusions. Symmetrical Haar-like features enabled discrimination of subjects with and without hyperuricemia in our sample. The locations of these discriminative features were in agreement with the interpretation of tongue appearance in traditional Chinese and Western medicine. Yan Cui, Shizhong Liao, Hongwu Wang, Hongyu Liu, Wenhua Wang, and Liqun Yin Copyright © 2015 Yan Cui et al. All rights reserved. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling Wed, 15 Apr 2015 12:38:00 +0000 An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. Hala Alshamlan, Ghada Badr, and Yousef Alohali Copyright © 2015 Hala Alshamlan et al. All rights reserved. Identification of Novel Breast Cancer Subtype-Specific Biomarkers by Integrating Genomics Analysis of DNA Copy Number Aberrations and miRNA-mRNA Dual Expression Profiling Wed, 15 Apr 2015 11:49:05 +0000 Breast cancer is a heterogeneous disease with well-defined molecular subtypes. Currently, comparative genomic hybridization arrays (aCGH) techniques have been developed rapidly, and recent evidences in studies of breast cancer suggest that tumors within gene expression subtypes share similar DNA copy number aberrations (CNA) which can be used to further subdivide subtypes. Moreover, subtype-specific miRNA expression profiles are also proposed as novel signatures for breast cancer classification. The identification of mRNA or miRNA expression-based breast cancer subtypes is considered an instructive means of prognosis. Here, we conducted an integrated analysis based on copy number aberrations data and miRNA-mRNA dual expression profiling data to identify breast cancer subtype-specific biomarkers. Interestingly, we found a group of genes residing in subtype-specific CNA regions that also display the corresponding changes in mRNAs levels and their target miRNAs’ expression. Among them, the predicted direct correlation of BRCA1-miR-143-miR-145 pairs was selected for experimental validation. The study results indicated that BRCA1 positively regulates miR-143-miR-145 expression and miR-143-miR-145 can serve as promising novel biomarkers for breast cancer subtyping. In our integrated genomics analysis and experimental validation, a new frame to predict candidate biomarkers of breast cancer subtype is provided and offers assistance in order to understand the potential disease etiology of the breast cancer subtypes. Dongguo Li, Hong Xia, Zhen-ya Li, Lin Hua, and Lin Li Copyright © 2015 Dongguo Li et al. All rights reserved. Improving the Understanding of Pathogenesis of Human Papillomavirus 16 via Mapping Protein-Protein Interaction Network Wed, 15 Apr 2015 11:48:17 +0000 The human papillomavirus 16 (HPV16) has high risk to lead various cancers and afflictions, especially, the cervical cancer. Therefore, investigating the pathogenesis of HPV16 is very important for public health. Protein-protein interaction (PPI) network between HPV16 and human was used as a measure to improve our understanding of its pathogenesis. By adopting sequence and topological features, a support vector machine (SVM) model was built to predict new interactions between HPV16 and human proteins. All interactions were comprehensively investigated and analyzed. The analysis indicated that HPV16 enlarged its scope of influence by interacting with human proteins as much as possible. These interactions alter a broad array of cell cycle progression. Furthermore, not only was HPV16 highly prone to interact with hub proteins and bottleneck proteins, but also it could effectively affect a breadth of signaling pathways. In addition, we found that the HPV16 evolved into high carcinogenicity on the condition that its own reproduction had been ensured. Meanwhile, this work will contribute to providing potential new targets for antiviral therapeutics and help experimental research in the future. Yongcheng Dong, Qifan Kuang, Xu Dai, Rong Li, Yiming Wu, Weijia Leng, Yizhou Li, and Menglong Li Copyright © 2015 Yongcheng Dong et al. All rights reserved. Coexpression Pattern Analysis of NPM1-Associated Genes in Chronic Myelogenous Leukemia Wed, 15 Apr 2015 09:10:06 +0000 Background. Nucleophosmin 1 (NPM1) plays an important role in ribosomal synthesis and malignancies, but NPM1 mutations occur rarely in the blast-crisis and chronic-phase chronic myelogenous leukemia (CML) patients. The NPM1-associated gene set (GCM_NPM1), in total 116 genes including NPM1, was chosen as the candidate gene set for the coexpression analysis. We wonder if NPM1-associated genes can affect the ribosomal synthesis and translation process in CML. Results. We presented a distribution-based approach for gene pair classification by identifying a disease-specific cutoff point that classified the coexpressed gene pairs into strong and weak coexpression structures. The differences in the coexpression patterns between the normal and the CML groups were reflected from the overall structure by performing two-sample Kolmogorov-Smirnov test. Our developed method effectively identified the coexpression pattern differences from the overall structure: for the maximum deviation . Moreover, we found that genes involved in the ribosomal synthesis and translation process tended to be coexpressed in the CML group. Conclusion. Our developed method can identify the coexpression difference between two different groups. Dysregulation of ribosomal synthesis and translation process may be related to the CML disease. Our significant findings may provide useful information for the novel CML mechanism exploration and cancer treatment. Fengfeng Wang, Lawrence W. C. Chan, Nancy B. Y. Tsui, S. C. Cesar Wong, Parco M. Siu, S. P. Yip, and Benjamin Y. M. Yung Copyright © 2015 Fengfeng Wang et al. All rights reserved. Prediction of Antifungal Activity of Gemini Imidazolium Compounds Wed, 15 Apr 2015 08:44:15 +0000 The progress of antimicrobial therapy contributes to the development of strains of fungi resistant to antimicrobial drugs. Since cationic surfactants have been described as good antifungals, we present a SAR study of a novel homologous series of 140 bis-quaternary imidazolium chlorides and analyze them with respect to their biological activity against Candida albicans as one of the major opportunistic pathogens causing a wide spectrum of diseases in human beings. We characterize a set of features of these compounds, concerning their structure, molecular descriptors, and surface active properties. SAR study was conducted with the help of the Dominance-Based Rough Set Approach (DRSA), which involves identification of relevant features and relevant combinations of features being in strong relationship with a high antifungal activity of the compounds. The SAR study shows, moreover, that the antifungal activity is dependent on the type of substituents and their position at the chloride moiety, as well as on the surface active properties of the compounds. We also show that molecular descriptors MlogP, HOMO-LUMO gap, total structure connectivity index, and Wiener index may be useful in prediction of antifungal activity of new chemical compounds. Łukasz Pałkowski, Jerzy Błaszczyński, Andrzej Skrzypczak, Jan Błaszczak, Alicja Nowaczyk, Joanna Wróblewska, Sylwia Kożuszko, Eugenia Gospodarek, Roman Słowiński, and Jerzy Krysiński Copyright © 2015 Łukasz Pałkowski et al. All rights reserved. Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction Wed, 15 Apr 2015 08:27:29 +0000 It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Acid Index database plus traditional orthogonal encoding features are used in this paper, taking both physiochemical and sequence information into consideration. Results of feature selection prove that , , , and are the most important positions. Two feature fusion methods are used in this paper: combination fusion and decision fusion aiming to get comprehensive feature representation and improve prediction performance. Decision fusion of subsets that getting after feature selection obtains excellent prediction performance, which proves feature selection combined with decision fusion is an effective and useful method for the task of HIV-1 protease cleavage site prediction. The results and analysis in this paper can provide useful instruction and help designing HIV-1 protease inhibitor in the future. Hui Liu, Xiaomiao Shi, Dongmei Guo, Zuowei Zhao, and Yimin Copyright © 2015 Hui Liu et al. All rights reserved. Identification of Novel Potential Vaccine Candidates against Tuberculosis Based on Reverse Vaccinology Wed, 15 Apr 2015 08:06:14 +0000 Tuberculosis (TB) is a chronic infectious disease, considered as the second leading cause of death worldwide, caused by Mycobacterium tuberculosis. The limited efficacy of the bacillus Calmette-Guérin (BCG) vaccine against pulmonary TB and the emergence of multidrug-resistant TB warrants the need for more efficacious vaccines. Reverse vaccinology uses the entire proteome of a pathogen to select the best vaccine antigens by in silico approaches. M. tuberculosis H37Rv proteome was analyzed with NERVE (New Enhanced Reverse Vaccinology Environment) prediction software to identify potential vaccine targets; these 331 proteins were further analyzed with VaxiJen for the determination of their antigenicity value. Only candidates with values ≥0.5 of antigenicity and 50% of adhesin probability and without homology with human proteins or transmembrane regions were selected, resulting in 73 antigens. These proteins were grouped by families in seven groups and analyzed by amino acid sequence alignments, selecting 16 representative proteins. For each candidate, a search of the literature and protein analysis with different bioinformatics tools, as well as a simulation of the immune response, was conducted. Finally, we selected six novel vaccine candidates, EsxL, PE26, PPE65, PE_PGRS49, PBP1, and Erp, from M. tuberculosis that can be used to improve or design new TB vaccines. Gloria P. Monterrubio-López, Jorge A. González-Y-Merchand, and Rosa María Ribas-Aparicio Copyright © 2015 Gloria P. Monterrubio-López et al. All rights reserved. Transcriptomic Analysis of mRNAs in Human Monocytic Cells Expressing the HIV-1 Nef Protein and Their Exosomes Wed, 15 Apr 2015 07:57:37 +0000 The Nef protein of human immunodeficiency virus (HIV) promotes viral replication and progression to AIDS. Besides its well-studied effects on intracellular signaling, Nef also functions through its secretion in exosomes, which are nanovesicles containing proteins, microRNAs, and mRNAs and are important for intercellular communication. Nef expression enhances exosome secretion and these exosomes can enter uninfected CD4 T cells leading to apoptotic death. We have recently reported the first miRNome analysis of exosomes secreted from Nef-expressing U937monocytic cells. Here we show genome-wide transcriptome analysis of Nef-expressing U937 cells and their exosomes. We identified four key mRNAs preferentially retained in Nef-expressing cells; these code for MECP2, HMOX1, AARSD1, and ATF2 and are important for chromatin modification and gene expression. Interestingly, their target miRNAs are exported out in exosomes. We also identified three key mRNAs selectively secreted in exosomes from Nef-expressing U937 cells and their corresponding miRNAs being preferentially retained in cells. These are AATK, SLC27A1, and CDKAL and are important in apoptosis and fatty acid transport. Thus, our study identifies selectively expressed mRNAs in Nef-expressing U937 cells and their exosomes and supports a new mode on intercellular regulation by the HIV-1 Nef protein. Madeeha Aqil, Saurav Mallik, Sanghamitra Bandyopadhyay, Ujjwal Maulik, and Shahid Jameel Copyright © 2015 Madeeha Aqil et al. All rights reserved. Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods Wed, 15 Apr 2015 06:14:19 +0000 Glycation is a nonenzymatic process in which proteins react with reducing sugar molecules. The identification of glycation sites in protein may provide guidelines to understand the biological function of protein glycation. In this study, we developed a computational method to predict protein glycation sites by using the support vector machine classifier. The experimental results showed that the prediction accuracy was 85.51% and an overall MCC was 0.70. Feature analysis indicated that the composition of -spaced amino acid pairs feature contributed the most for glycation sites prediction. Yan Liu, Wenxiang Gu, Wenyi Zhang, and Jianan Wang Copyright © 2015 Yan Liu et al. All rights reserved. A Gas Chromatography-Mass Spectrometry Based Study on Urine Metabolomics in Rats Chronically Poisoned with Hydrogen Sulfide Tue, 14 Apr 2015 17:02:10 +0000 Gas chromatography-mass spectrometry (GS-MS) in combination with multivariate statistical analysis was applied to explore the metabolic variability in urine of chronically hydrogen sulfide- (H2S-) poisoned rats relative to control ones. The changes in endogenous metabolites were studied by partial least squares-discriminate analysis (PLS-DA) and independent-samples t-test. The metabolic patterns of H2S-poisoned group are separated from the control, suggesting that the metabolic profiles of H2S-poisoned rats were markedly different from the controls. Moreover, compared to the control group, the level of alanine, d-ribose, tetradecanoic acid, L-aspartic acid, pentanedioic acid, cholesterol, acetate, and oleic acid in rat urine of the poisoning group decreased, while the level of glycine, d-mannose, arabinofuranose, and propanoic acid increased. These metabolites are related to amino acid metabolism as well as energy and lipid metabolism in vivo. Studying metabolomics using GC-MS allows for a comprehensive overview of the metabolism of the living body. This technique can be employed to decipher the mechanism of chronic H2S poisoning, thus promoting the use of metabolomics in clinical toxicology. Mingjie Deng, Meiling Zhang, Fa Sun, Jianshe Ma, Lufeng Hu, Xuezhi Yang, Guanyang Lin, and Xianqin Wang Copyright © 2015 Mingjie Deng et al. All rights reserved. An Integrated Modeling and Experimental Approach to Study the Influence of Environmental Nutrients on Biofilm Formation of Pseudomonas aeruginosa Tue, 14 Apr 2015 16:59:46 +0000 The availability of nutrient components in the environment was identified as a critical regulator of virulence and biofilm formation in Pseudomonas aeruginosa. This work proposes the first systems-biology approach to quantify microbial biofilm formation upon the change of nutrient availability in the environment. Specifically, the change of fluxes of metabolic reactions that were positively associated with P. aeruginosa biofilm formation was used to monitor the trend for P. aeruginosa to form a biofilm. The uptake rates of nutrient components were changed according to the change of the nutrient availability. We found that adding each of the eleven amino acids (Arg, Tyr, Phe, His, Iso, Orn, Pro, Glu, Leu, Val, and Asp) to minimal medium promoted P. aeruginosa biofilm formation. Both modeling and experimental approaches were further developed to quantify P. aeruginosa biofilm formation for four different availability levels for each of the three ions that include ferrous ions, sulfate, and phosphate. The developed modeling approach correctly predicted the amount of biofilm formation. By comparing reaction flux change upon the change of nutrient concentrations, metabolic reactions used by P. aeruginosa to regulate its biofilm formation are mainly involved in arginine metabolism, glutamate production, magnesium transport, acetate metabolism, and the TCA cycle. Zhaobin Xu, Sabina Islam, Thomas K. Wood, and Zuyi Huang Copyright © 2015 Zhaobin Xu et al. All rights reserved. Computer-Simulated Biopsy Marking System for Endoscopic Surveillance of Gastric Lesions: A Pilot Study Tue, 14 Apr 2015 16:57:39 +0000 Endoscopic tattoo with India ink injection for surveillance of premalignant gastric lesions is technically cumbersome and may not be durable. The aim of the study is to evaluate the accuracy of a novel, computer-simulated biopsy marking system (CSBMS) developed for the endoscopic marking of gastric lesions. Twenty-five patients with history of gastric intestinal metaplasia received both CSBMS-guided marking and India ink injection in five points in the stomach at index endoscopy. A second endoscopy was performed at three months. Primary outcome was accuracy of CSBMS (distance between CSBMS probe-guided site and tattoo site measured by CSBMS). The mean accuracy of CSBMS at angularis was  mm, antral lesser curvature  mm, antral greater curvature  mm, antral anterior wall  mm, and antral posterior wall  mm. CSBMS ( versus seconds; ) required less procedure time compared to endoscopic tattooing. No adverse events were encountered. CSBMS accurately identified previously marked gastric sites by endoscopic tattooing within 1 cm on follow-up endoscopy. Weiling Hu, Bin Wang, Leimin Sun, Shujie Chen, Liangjing Wang, Kan Wang, Jiaguo Wu, John J. Kim, Jiquan Liu, Ning Dai, Huilong Duan, and Jianmin Si Copyright © 2015 Weiling Hu et al. All rights reserved. Intuitive Web-Based Experimental Design for High-Throughput Biomedical Data Tue, 14 Apr 2015 11:18:12 +0000 Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental design, including the conditions that are studied as well as information that might be interesting for failure analysis or further experiments in the future. In addition to the management of this information, means for an integrated design and interfaces for structured data annotation are urgently needed by researchers. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model. Andreas Friedrich, Erhan Kenar, Oliver Kohlbacher, and Sven Nahnsen Copyright © 2015 Andreas Friedrich et al. All rights reserved. A Method for Generating New Datasets Based on Copy Number for Cancer Analysis Wed, 08 Apr 2015 12:22:40 +0000 New data sources for the analysis of cancer data are rapidly supplementing the large number of gene-expression markers used for current methods of analysis. Significant among these new sources are copy number variation (CNV) datasets, which typically enumerate several hundred thousand CNVs distributed throughout the genome. Several useful algorithms allow systems-level analyses of such datasets. However, these rich data sources have not yet been analyzed as deeply as gene-expression data. To address this issue, the extensive toolsets used for analyzing expression data in cancerous and noncancerous tissue (e.g., gene set enrichment analysis and phenotype prediction) could be redirected to extract a great deal of predictive information from CNV data, in particular those derived from cancers. Here we present a software package capable of preprocessing standard Agilent copy number datasets into a form to which essentially all expression analysis tools can be applied. We illustrate the use of this toolset in predicting the survival time of patients with ovarian cancer or glioblastoma multiforme and also provide an analysis of gene- and pathway-level deletions in these two types of cancer. Shinuk Kim, Mark Kon, and Hyunsik Kang Copyright © 2015 Shinuk Kim et al. All rights reserved. Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process Mon, 06 Apr 2015 12:30:15 +0000 Recent technological advances have created challenges for geneticists and a need to adapt to a wide range of new bioinformatics tools and an expanding wealth of publicly available data (e.g., mutation databases, and software). This wide range of methods and a diversity of file formats used in sequence analysis is a significant issue, with a considerable amount of time spent before anyone can even attempt to analyse the genetic basis of human disorders. Another point to consider that is although many possess “just enough” knowledge to analyse their data, they do not make full use of the tools and databases that are available and also do not fully understand how their data was created. The primary aim of this review is to document some of the key approaches and provide an analysis schema to make the analysis process more efficient and reliable in the context of discovering highly penetrant causal mutations/genes. This review will also compare the methods used to identify highly penetrant variants when data is obtained from consanguineous individuals as opposed to nonconsanguineous; and when Mendelian disorders are analysed as opposed to common-complex disorders. A. Mesut Erzurumluoglu, Santiago Rodriguez, Hashem A. Shihab, Denis Baird, Tom G. Richardson, Ian N. M. Day, and Tom R. Gaunt Copyright © 2015 A. Mesut Erzurumluoglu et al. All rights reserved. Genetic Polymorphism in Extracellular Regulators of Wnt Signaling Pathway Sun, 05 Apr 2015 10:46:55 +0000 The Wnt signaling pathway is mediated by a family of secreted glycoproteins through canonical and noncanonical mechanism. The signaling pathways are regulated by various modulators, which are classified into two classes on the basis of their interaction with either Wnt or its receptors. Secreted frizzled-related proteins (sFRPs) are the member of class that binds to Wnt protein and antagonizes Wnt signaling pathway. The other class consists of Dickkopf (DKK) proteins family that binds to Wnt receptor complex. The present review discusses the disease related association of various polymorphisms in Wnt signaling modulators. Furthermore, this review also highlights that some of the sFRPs and DKKs are unable to act as an antagonist for Wnt signaling pathway and thus their function needs to be explored more extensively. Garima Sharma, Ashish Ranjan Sharma, Eun-Min Seo, and Ju-Suk Nam Copyright © 2015 Garima Sharma et al. All rights reserved. Integrative Analysis of CRISPR/Cas9 Target Sites in the Human HBB Gene Tue, 31 Mar 2015 09:39:46 +0000 Recently, the clustered regularly interspaced short palindromic repeats (CRISPR) system has emerged as a powerful customizable artificial nuclease to facilitate precise genetic correction for tissue regeneration and isogenic disease modeling. However, previous studies reported substantial off-target activities of CRISPR system in human cells, and the enormous putative off-target sites are labor-intensive to be validated experimentally, thus motivating bioinformatics methods for rational design of CRISPR system and prediction of its potential off-target effects. Here, we describe an integrative analytical process to identify specific CRISPR target sites in the human β-globin gene (HBB) and predict their off-target effects. Our method includes off-target analysis in both coding and noncoding regions, which was neglected by previous studies. It was found that the CRISPR target sites in the introns have fewer off-target sites in the coding regions than those in the exons. Remarkably, target sites containing certain transcriptional factor motif have enriched binding sites of relevant transcriptional factor in their off-target sets. We also found that the intron sites have fewer SNPs, which leads to less variation of CRISPR efficiency in different individuals during clinical applications. Our studies provide a standard analytical procedure to select specific CRISPR targets for genetic correction. Yumei Luo, Detu Zhu, Zhizhuo Zhang, Yaoyong Chen, and Xiaofang Sun Copyright © 2015 Yumei Luo et al. All rights reserved. In Silico Search of Energy Metabolism Inhibitors for Alternative Leishmaniasis Treatments Mon, 30 Mar 2015 13:56:04 +0000 Leishmaniasis is a complex disease that affects mammals and is caused by approximately 20 distinct protozoa from the genus Leishmania. Leishmaniasis is an endemic disease that exerts a large socioeconomic impact on poor and developing countries. The current treatment for leishmaniasis is complex, expensive, and poorly efficacious. Thus, there is an urgent need to develop more selective, less expensive new drugs. The energy metabolism pathways of Leishmania include several interesting targets for specific inhibitors. In the present study, we sought to establish which energy metabolism enzymes in Leishmania could be targets for inhibitors that have already been approved for the treatment of other diseases. We were able to identify 94 genes and 93 Leishmania energy metabolism targets. Using each gene’s designation as a search criterion in the TriTrypDB database, we located the predicted peptide sequences, which in turn were used to interrogate the DrugBank, Therapeutic Target Database (TTD), and PubChem databases. We identified 44 putative targets of which 11 are predicted to be amenable to inhibition by drugs which have already been approved for use in humans for 11 of these targets. We propose that these drugs should be experimentally tested and potentially used in the treatment of leishmaniasis. Lourival A. Silva, Marina C. Vinaud, Ana Maria Castro, Pedro Vítor L. Cravo, and José Clecildo B. Bezerra Copyright © 2015 Lourival A. Silva et al. All rights reserved. Evaluation and Application of the Strand-Specific Protocol for Next-Generation Sequencing Sun, 29 Mar 2015 07:06:19 +0000 Next-generation sequencing (NGS) has become a powerful sequencing tool, applied in a wide range of biological studies. However, the traditional sample preparation protocol for NGS is non-strand-specific (NSS), leading to biased estimates of expression for transcripts overlapped at the antisense strand. Strand-specific (SS) protocols have recently been developed. In this study, we prepared the same RNA sample by using the SS and NSS protocols, followed by sequencing with Illumina HiSeq platform. Using real-time quantitative PCR as a standard, we first proved that the SS protocol more precisely estimates gene expressions compared with the NSS protocol, particularly for those overlapped at the antisense strand. In addition, we also showed that the sequence reads from the SS protocol are comparable with those from conventional NSS protocols in many aspects. Finally, we also mapped a fraction of sequence reads back to the antisense strand of the known genes, originally without annotated genes located. Using sequence assembly and PCR validation, we succeeded in identifying and characterizing the novel antisense genes. Our results show that the SS protocol performs more accurately than the traditional NSS protocol and can be applied in future studies. Kuo-Wang Tsai, Bill Chang, Cheng-Tsung Pan, Wei-Chen Lin, Ting-Wen Chen, and Sung-Chou Li Copyright © 2015 Kuo-Wang Tsai et al. All rights reserved. Distributed Artificial Intelligence Models for Knowledge Discovery in Bioinformatics Wed, 25 Mar 2015 13:17:46 +0000 Juan M. Corchado, Isabelle Bichindaritz, and Juan F. De Paz Copyright © 2015 Juan M. Corchado et al. All rights reserved. A Linear-RBF Multikernel SVM to Classify Big Text Corpora Mon, 23 Mar 2015 08:13:54 +0000 Support vector machine (SVM) is a powerful technique for classification. However, SVM is not suitable for classification of large datasets or text corpora, because the training complexity of SVMs is highly dependent on the input size. Recent developments in the literature on the SVM and other kernel methods emphasize the need to consider multiple kernels or parameterizations of kernels because they provide greater flexibility. This paper shows a multikernel SVM to manage highly dimensional data, providing an automatic parameterization with low computational cost and improving results against SVMs parameterized under a brute-force search. The model consists in spreading the dataset into cohesive term slices (clusters) to construct a defined structure (multikernel). The new approach is tested on different text corpora. Experimental results show that the new classifier has good accuracy compared with the classic SVM, while the training is significantly faster than several other SVM classifiers. R. Romero, E. L. Iglesias, and L. Borrajo Copyright © 2015 R. Romero et al. All rights reserved. A Network Flow Approach to Predict Protein Targets and Flavonoid Backbones to Treat Respiratory Syncytial Virus Infection Mon, 23 Mar 2015 06:08:56 +0000 Background. Respiratory syncytial virus (RSV) infection is the major cause of respiratory disease in lower respiratory tract in infants and young children. Attempts to develop effective vaccines or pharmacological treatments to inhibit RSV infection without undesired effects on human health have been unsuccessful. However, RSV infection has been reported to be affected by flavonoids. The mechanisms underlying viral inhibition induced by these compounds are largely unknown, making the development of new drugs difficult. Methods. To understand the mechanisms induced by flavonoids to inhibit RSV infection, a systems pharmacology-based study was performed using microarray data from primary culture of human bronchial cells infected by RSV, together with compound-proteomic interaction data available for Homo sapiens. Results. After an initial evaluation of 26 flavonoids, 5 compounds (resveratrol, quercetin, myricetin, apigenin, and tricetin) were identified through topological analysis of a major chemical-protein (CP) and protein-protein interacting (PPI) network. In a nonclustered form, these flavonoids regulate directly the activity of two protein bottlenecks involved in inflammation and apoptosis. Conclusions. Our findings may potentially help uncovering mechanisms of action of early RSV infection and provide chemical backbones and their protein targets in the difficult quest to develop new effective drugs. José Eduardo Vargas, Renato Puga, Joice de Faria Poloni, Luis Fernando Saraiva Macedo Timmers, Barbara Nery Porto, Osmar Norberto de Souza, Diego Bonatto, Paulo Márcio Condessa Pitrez, and Renato Tetelbom Stein Copyright © 2015 José Eduardo Vargas et al. All rights reserved. Identification of Novel Thyroid Cancer-Related Genes and Chemicals Using Shortest Path Algorithm Sun, 22 Mar 2015 11:26:51 +0000 Thyroid cancer is a typical endocrine malignancy. In the past three decades, the continued growth of its incidence has made it urgent to design effective treatments to treat this disease. To this end, it is necessary to uncover the mechanism underlying this disease. Identification of thyroid cancer-related genes and chemicals is helpful to understand the mechanism of thyroid cancer. In this study, we generalized some previous methods to discover both disease genes and chemicals. The method was based on shortest path algorithm and applied to discover novel thyroid cancer-related genes and chemicals. The analysis of the final obtained genes and chemicals suggests that some of them are crucial to the formation and development of thyroid cancer. It is indicated that the proposed method is effective for the discovery of novel disease genes and chemicals. Yang Jiang, Peiwei Zhang, Li-Peng Li, Yi-Chun He, Ru-jian Gao, and Yu-Fei Gao Copyright © 2015 Yang Jiang et al. All rights reserved. A Meta-Analysis Strategy for Gene Prioritization Using Gene Expression, SNP Genotype, and eQTL Data Sun, 22 Mar 2015 10:56:57 +0000 In order to understand disease pathogenesis, improve medical diagnosis, or discover effective drug targets, it is important to identify significant genes deeply involved in human disease. For this purpose, many earlier approaches attempted to prioritize candidate genes using gene expression profiles or SNP genotype data, but they often suffer from producing many false-positive results. To address this issue, in this paper, we propose a meta-analysis strategy for gene prioritization that employs three different genetic resources—gene expression data, single nucleotide polymorphism (SNP) genotype data, and expression quantitative trait loci (eQTL) data—in an integrative manner. For integration, we utilized an improved technique for the order of preference by similarity to ideal solution (TOPSIS) to combine scores from distinct resources. This method was evaluated on two publicly available datasets regarding prostate cancer and lung cancer to identify disease-related genes. Consequently, our proposed strategy for gene prioritization showed its superiority to conventional methods in discovering significant disease-related genes with several types of genetic resources, while making good use of potential complementarities among available resources. Jingmin Che and Miyoung Shin Copyright © 2015 Jingmin Che and Miyoung Shin. All rights reserved. Analysis of Environmental Stress Factors Using an Artificial Growth System and Plant Fitness Optimization Sun, 22 Mar 2015 10:35:00 +0000 The environment promotes evolution. Evolutionary processes represent environmental adaptations over long time scales; evolution of crop genomes is not inducible within the relatively short time span of a human generation. Extreme environmental conditions can accelerate evolution, but such conditions are often stress inducing and disruptive. Artificial growth systems can be used to induce and select genomic variation by changing external environmental conditions, thus, accelerating evolution. By using cloud computing and big-data analysis, we analyzed environmental stress factors for Pleurotus ostreatus by assessing, evaluating, and predicting information of the growth environment. Through the indexing of environmental stress, the growth environment can be precisely controlled and developed into a technology for improving crop quality and production. Meonghun Lee and Hyun Yoe Copyright © 2015 Meonghun Lee and Hyun Yoe. All rights reserved. Agent-Based Spatiotemporal Simulation of Biomolecular Systems within the Open Source MASON Framework Sun, 22 Mar 2015 10:04:24 +0000 Agent-based modelling is being used to represent biological systems with increasing frequency and success. This paper presents the implementation of a new tool for biomolecular reaction modelling in the open source Multiagent Simulator of Neighborhoods framework. The rationale behind this new tool is the necessity to describe interactions at the molecular level to be able to grasp emergent and meaningful biological behaviour. We are particularly interested in characterising and quantifying the various effects that facilitate biocatalysis. Enzymes may display high specificity for their substrates and this information is crucial to the engineering and optimisation of bioprocesses. Simulation results demonstrate that molecule distributions, reaction rate parameters, and structural parameters can be adjusted separately in the simulation allowing a comprehensive study of individual effects in the context of realistic cell environments. While higher percentage of collisions with occurrence of reaction increases the affinity of the enzyme to the substrate, a faster reaction (i.e., turnover number) leads to a smaller number of time steps. Slower diffusion rates and molecular crowding (physical hurdles) decrease the collision rate of reactants, hence reducing the reaction rate, as expected. Also, the random distribution of molecules affects the results significantly. Gael Pérez-Rodríguez, Martín Pérez-Pérez, Daniel Glez-Peña, Florentino Fdez-Riverola, Nuno F. Azevedo, and Anália Lourenço Copyright © 2015 Gael Pérez-Rodríguez et al. All rights reserved. Using the eServices Platform for Detecting Behavior Patterns Deviation in the Elderly Assisted Living: A Case Study Sun, 22 Mar 2015 09:33:56 +0000 World’s aging population is rising and the elderly are increasingly isolated socially and geographically. As a consequence, in many situations, they need assistance that is not granted in time. In this paper, we present a solution that follows the CRISP-DM methodology to detect the elderly’s behavior pattern deviations that may indicate possible risk situations. To obtain these patterns, many variables are aggregated to ensure the alert system reliability and minimize eventual false positive alert situations. These variables comprehend information provided by body area network (BAN), by environment sensors, and also by the elderly’s interaction in a service provider platform, called eServices—Elderly Support Service Platform. eServices is a scalable platform aggregating a service ecosystem developed specially for elderly people. This pattern recognition will further activate the adequate response. With the system evolution, it will learn to predict potential danger situations for a specified user, acting preventively and ensuring the elderly’s safety and well-being. As the eServices platform is still in development, synthetic data, based on real data sample and empiric knowledge, is being used to populate the initial dataset. The presented work is a proof of concept of knowledge extraction using the eServices platform information. Regardless of not using real data, this work proves to be an asset, achieving a good performance in preventing alert situations. Isabel Marcelino, David Lopes, Michael Reis, Fernando Silva, Rosalía Laza, and António Pereira Copyright © 2015 Isabel Marcelino et al. All rights reserved. A Distributed Multiagent System Architecture for Body Area Networks Applied to Healthcare Monitoring Sun, 22 Mar 2015 09:23:02 +0000 In the last years the area of health monitoring has grown significantly, attracting the attention of both academia and commercial sectors. At the same time, the availability of new biomedical sensors and suitable network protocols has led to the appearance of a new generation of wireless sensor networks, the so-called wireless body area networks. Nowadays, these networks are routinely used for continuous monitoring of vital parameters, movement, and the surrounding environment of people, but the large volume of data generated in different locations represents a major obstacle for the appropriate design, development, and deployment of more elaborated intelligent systems. In this context, we present an open and distributed architecture based on a multiagent system for recognizing human movements, identifying human postures, and detecting harmful activities. The proposed system evolved from a single node for fall detection to a multisensor hardware solution capable of identifying unhampered falls and analyzing the users’ movement. The experiments carried out contemplate two different scenarios and demonstrate the accuracy of our proposal as a real distributed movement monitoring and accident detection system. Moreover, we also characterize its performance, enabling future analyses and comparisons with similar approaches. Filipe Felisberto, Rosalía Laza, Florentino Fdez-Riverola, and António Pereira Copyright © 2015 Filipe Felisberto et al. All rights reserved. RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases Sun, 22 Mar 2015 09:18:34 +0000 High-throughput methods such as next-generation sequencing or DNA microarrays lack precision, as they return hundreds of genes for a single disease profile. Several computational methods applied to physical interaction of protein networks have been successfully used in identification of the best disease candidates for each expression profile. An open problem for these methods is the ability to combine and take advantage of the wealth of biomedical data publicly available. We propose an enhanced method to improve selection of the best disease targets for a multilayer biomedical network that integrates PPI data annotated with stable knowledge from OMIM diseases and GO biological processes. We present a comprehensive validation that demonstrates the advantage of the proposed approach, Recursive Random Walk with Restarts (RecRWR). The obtained results outline the superiority of the proposed approach, RecRWR, in identifying disease candidates, especially with high levels of biological noise and benefiting from all data available. Joel Perdiz Arrais and José Luís Oliveira Copyright © 2015 Joel Perdiz Arrais and José Luís Oliveira. All rights reserved. Probabilistic Inference of Biological Networks via Data Integration Sun, 22 Mar 2015 09:02:27 +0000 There is significant interest in inferring the structure of subcellular networks of interaction. Here we consider supervised interactive network inference in which a reference set of known network links and nonlinks is used to train a classifier for predicting new links. Many types of data are relevant to inferring functional links between genes, motivating the use of data integration. We use pairwise kernels to predict novel links, along with multiple kernel learning to integrate distinct sources of data into a decision function. We evaluate various pairwise kernels to establish which are most informative and compare individual kernel accuracies with accuracies for weighted combinations. By associating a probability measure with classifier predictions, we enable cautious classification, which can increase accuracy by restricting predictions to high-confidence instances, and data cleaning that can mitigate the influence of mislabeled training instances. Although one pairwise kernel (the tensor product pairwise kernel) appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast) reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy. Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%. Mark F. Rogers, Colin Campbell, and Yiming Ying Copyright © 2015 Mark F. Rogers et al. All rights reserved. aCGH-MAS: Analysis of aCGH by means of Multiagent System Sun, 22 Mar 2015 08:55:39 +0000 There are currently different techniques, such as CGH arrays, to study genetic variations in patients. CGH arrays analyze gains and losses in different regions in the chromosome. Regions with gains or losses in pathologies are important for selecting relevant genes or CNVs (copy-number variations) associated with the variations detected within chromosomes. Information corresponding to mutations, genes, proteins, variations, CNVs, and diseases can be found in different databases and it would be of interest to incorporate information of different sources to extract relevant information. This work proposes a multiagent system to manage the information of aCGH arrays, with the aim of providing an intuitive and extensible system to analyze and interpret the results. The agent roles integrate statistical techniques to select relevant variations and visualization techniques for the interpretation of the final results and to extract relevant information from different sources of information by applying a CBR system. Juan F. De Paz, Rocío Benito, Javier Bajo, Ana Eugenia Rodríguez, and María Abáigar Copyright © 2015 Juan F. De Paz et al. All rights reserved. Gene Knockout Identification Using an Extension of Bees Hill Flux Balance Analysis Sun, 22 Mar 2015 08:47:09 +0000 Microbial strain optimisation for the overproduction of a desired phenotype has been a popular topic in recent years. Gene knockout is a genetic engineering technique that can modify the metabolism of microbial cells to obtain desirable phenotypes. Optimisation algorithms have been developed to identify the effects of gene knockout. However, the complexities of metabolic networks have made the process of identifying the effects of genetic modification on desirable phenotypes challenging. Furthermore, a vast number of reactions in cellular metabolism often lead to a combinatorial problem in obtaining optimal gene knockout. The computational time increases exponentially as the size of the problem increases. This work reports an extension of Bees Hill Flux Balance Analysis (BHFBA) to identify optimal gene knockouts to maximise the production yield of desired phenotypes while sustaining the growth rate. This proposed method functions by integrating OptKnock into BHFBA for validating the results automatically. The results show that the extension of BHFBA is suitable, reliable, and applicable in predicting gene knockout. Through several experiments conducted on Escherichia coli, Bacillus subtilis, and Clostridium thermocellum as model organisms, extension of BHFBA has shown better performance in terms of computational time, stability, growth rate, and production yield of desired phenotypes. Yee Wen Choon, Mohd Saberi Mohamad, Safaai Deris, Chuii Khim Chong, Sigeru Omatu, and Juan Manuel Corchado Copyright © 2015 Yee Wen Choon et al. All rights reserved. Modelling the Longevity of Dental Restorations by means of a CBR System Thu, 19 Mar 2015 14:23:43 +0000 The lifespan of dental restorations is limited. Longevity depends on the material used and the different characteristics of the dental piece. However, it is not always the case that the best and longest lasting material is used since patients may prefer different treatments according to how noticeable the material is. Over the last 100 years, the most commonly used material has been silver amalgam, which, while very durable, is somewhat aesthetically displeasing. Our study is based on the collection of data from the charts, notes, and radiographic information of restorative treatments performed by Dr. Vera in 1993, the analysis of the information by computer artificial intelligence to determine the most appropriate restoration, and the monitoring of the evolution of the dental restoration. The data will be treated confidentially according to the Organic Law 15/1999 on 13 December on the Protection of Personal Data. This paper also presents a clustering technique capable of identifying the most significant cases with which to instantiate the case-base. In order to classify the cases, a mixture of experts is used which incorporates a Bayesian network and a multilayer perceptron; the combination of both classifiers is performed with a neural network. Ignacio J. Aliaga, Vicente Vera, Juan F. De Paz, Alvaro E. García, and Mohd Saberi Mohamad Copyright © 2015 Ignacio J. Aliaga et al. All rights reserved. Bladder Carcinoma Data with Clinical Risk Factors and Molecular Markers: A Cluster Analysis Thu, 19 Mar 2015 13:41:50 +0000 Bladder cancer occurs in the epithelial lining of the urinary bladder and is amongst the most common types of cancer in humans, killing thousands of people a year. This paper is based on the hypothesis that the use of clinical and histopathological data together with information about the concentration of various molecular markers in patients is useful for the prediction of outcomes and the design of treatments of nonmuscle invasive bladder carcinoma (NMIBC). A population of 45 patients with a new diagnosis of NMIBC was selected. Patients with benign prostatic hyperplasia (BPH), muscle invasive bladder carcinoma (MIBC), carcinoma in situ (CIS), and NMIBC recurrent tumors were not included due to their different clinical behavior. Clinical history was obtained by means of anamnesis and physical examination, and preoperative imaging and urine cytology were carried out for all patients. Then, patients underwent conventional transurethral resection (TURBT) and some proteomic analyses quantified the biomarkers (p53, neu, and EGFR). A postoperative follow-up was performed to detect relapse and progression. Clusterings were performed to find groups with clinical, molecular markers, histopathological prognostic factors, and statistics about recurrence, progression, and overall survival of patients with NMIBC. Four groups were found according to tumor sizes, risk of relapse or progression, and biological behavior. Outlier patients were also detected and categorized according to their clinical characters and biological behavior. Enrique Redondo-Gonzalez, Leandro Nunes de Castro, Jesús Moreno-Sierra, María Luisa Maestro de las Casas, Vicente Vera-Gonzalez, Daniel Gomes Ferrari, and Juan Manuel Corchado Copyright © 2015 Enrique Redondo-Gonzalez et al. All rights reserved.