The most discriminant 30 features of each image were obtained by integrating the three statistical features selection techniques such as Fisher, Probability of Error plus Average Correlation, and Mutual Information (F + PA + MI). Selected texture data clustering was verified by nonlinear discriminant analysis while linear discriminant analysis approach was applied for multispectral data. For classification, the texture and multispectral data were deployed to artificial neural network (ANN: n-class). By implementing a cross validation method (80-20), we received an accuracy of 91.332% for texture data and 96.40% for multispectral data, respectively. Salman Qadri, Dost Muhammad Khan, Farooq Ahmad, Syed Furqan Qadri, Masroor Ellahi Babar, Muhammad Shahid, Muzammil Ul-Rehman, Abdul Razzaq, Syed Shah Muhammad, Muhammad Fahad, Sarfraz Ahmad, Muhammad Tariq Pervez, Nasir Naveed, Naeem Aslam, Mutiullah Jamil, Ejaz Ahmad Rehmani, Nazir Ahmad, and Naeem Akhtar Khan Copyright © 2016 Salman Qadri et al. All rights reserved. Finding Clocks in Genes: A Bayesian Approach to Estimate Periodicity Thu, 02 Jun 2016 13:49:37 +0000 Identification of rhythmic gene expression from metabolic cycles to circadian rhythms is crucial for understanding the gene regulatory networks and functions of these biological processes. Recently, two algorithms, JTK_CYCLE and ARSER, have been developed to estimate periodicity of rhythmic gene expression. JTK_CYCLE performs well for long or less noisy time series, while ARSER performs well for detecting a single rhythmic category. However, observing gene expression at high temporal resolution is not always feasible, and many scientists are interested in exploring both ultradian and circadian rhythmic categories simultaneously. In this paper, a new algorithm, named autoregressive Bayesian spectral regression (ABSR), is proposed. It estimates the period of time-course experimental data and classifies gene expression profiles into multiple rhythmic categories simultaneously. Through the simulation studies, it is shown that ABSR substantially improves the accuracy of periodicity estimation and clustering of rhythmic categories as compared to JTK_CYCLE and ARSER for the data with low temporal resolution. Moreover, ABSR is insensitive to rhythmic patterns. This new scheme is applied to existing time-course mouse liver data to estimate period of rhythms and classify the genes into ultradian, circadian, and arrhythmic categories. It is observed that 49.2% of the circadian profiles detected by JTK_CYCLE with 1-hour resolution are also detected by ABSR with only 4-hour resolution. Yan Ren, Christian I. Hong, Sookkyung Lim, and Seongho Song Copyright © 2016 Yan Ren et al. All rights reserved. Therapeutic Effects of CUR-Activated Human Umbilical Cord Mesenchymal Stem Cells on 1-Methyl-4-phenylpyridine-Induced Parkinson’s Disease Cell Model Tue, 31 May 2016 07:13:19 +0000 The purpose of this study is to evaluate the therapeutic effects of human umbilical cord-derived mesenchymal stem cells (hUC-MSC) activated by curcumin (CUR) on PC12 cells induced by 1-methyl-4-phenylpyridinium ion (MPP+), a cell model of Parkinson’s disease (PD). The supernatant of hUC-MSC and hUC-MSC activated by 5 µmol/L CUR (hUC-MSC-CUR) were collected in accordance with the same concentration. The cell proliferation and differentiation potential to dopaminergic neuronal cells and antioxidation were observed in PC12 cells after being treated with the above two supernatants and 5 µmol/L CUR. The results showed that the hUC-MSC-CUR could more obviously promote the proliferation and the expression of tyrosine hydroxylase (TH) and microtubule associated protein-2 (MAP2) and significantly decreased the expression of nitric oxide (NO) and inducible nitric oxide synthase (iNOS) in PC12 cells. Furtherly, cytokines detection gave a clue that the expression of IL-6, IL-10, and NGF was significantly higher in the group treated with the hUC-MSC-CUR compared to those of other two groups. Therefore, the hUC-MSC-CUR may be a potential strategy to promote the proliferation and differentiation of PD cell model, therefore providing new insights into a novel therapeutic approach in PD. Li Jinfeng, Wang Yunliang, Liu Xinshan, Wang Yutong, Wang Shanshan, Xue Peng, Yang Xiaopeng, Xu Zhixiu, Lu Qingshan, Yin Honglei, Cao Xia, Wang Hongwei, and Cao Bingzhen Copyright © 2016 Li Jinfeng et al. All rights reserved. The Effects of Real-Time Interactive Multimedia Teleradiology System Tue, 17 May 2016 14:18:35 +0000 This study describes the design of a real-time interactive multimedia teleradiology system and assesses how the system is used by referring physicians in point-of-care situations and supports or hinders aspects of physician-radiologist interaction. We developed a real-time multimedia teleradiology management system that automates the transfer of images and radiologists’ reports and surveyed physicians to triangulate the findings and to verify the realism and results of the experiment. The web-based survey was delivered to 150 physicians from a range of specialties. The survey was completed by 72% of physicians. Data showed a correlation between rich interactivity, satisfaction, and effectiveness. The results of our experiments suggest that real-time multimedia teleradiology systems are valued by referring physicians and may have the potential for enhancing their practice and improving patient care and highlight the critical role of multimedia technologies to provide real-time multimode interactivity in current medical care. Lilac Al-Safadi Copyright © 2016 Lilac Al-Safadi. All rights reserved. Impacts of Nonsynonymous Single Nucleotide Polymorphisms of Adiponectin Receptor 1 Gene on Corresponding Protein Stability: A Computational Approach Sun, 15 May 2016 10:00:10 +0000 Despite the reported association of adiponectin receptor 1 (ADIPOR1) gene mutations with vulnerability to several human metabolic diseases, there is lack of computational analysis on the functional and structural impacts of single nucleotide polymorphisms (SNPs) of the human ADIPOR1 at protein level. Therefore, sequence- and structure-based computational tools were employed in this study to functionally and structurally characterize the coding nsSNPs of ADIPOR1 gene listed in the dbSNP database. Our in silico analysis by SIFT, nsSNPAnalyzer, PolyPhen-2, Fathmm, I-Mutant 2.0, SNPs&GO, PhD-SNP, PANTHER, and SNPeffect tools identified the nsSNPs with distorting functional impacts, namely, rs765425383 (A348G), rs752071352 (H341Y), rs759555652 (R324L), rs200326086 (L224F), and rs766267373 (L143P) from 74 nsSNPs of ADIPOR1 gene. Finally the aforementioned five deleterious nsSNPs were introduced using Swiss-PDB Viewer package within the X-ray crystal structure of ADIPOR1 protein, and changes in free energy for these mutations were computed. Although increased free energy was observed for all the mutants, the nsSNP H341Y caused the highest energy increase amongst all. RMSD and TM scores predicted that mutants were structurally similar to wild type protein. Our analyses suggested that the aforementioned variants especially H341Y could directly or indirectly destabilize the amino acid interactions and hydrogen bonding networks of ADIPOR1. Md. Abu Saleh, Md. Solayman, Sudip Paul, Moumoni Saha, Md. Ibrahim Khalil, and Siew Hua Gan Copyright © 2016 Md. Abu Saleh et al. All rights reserved. Detecting Susceptibility to Breast Cancer with SNP-SNP Interaction Using BPSOHS and Emotional Neural Networks Wed, 11 May 2016 08:24:34 +0000 Studies for the association between diseases and informative single nucleotide polymorphisms (SNPs) have received great attention. However, most of them just use the whole set of useful SNPs and fail to consider the SNP-SNP interactions, while these interactions have already been proven in biology experiments. In this paper, we use a binary particle swarm optimization with hierarchical structure (BPSOHS) algorithm to improve the effective of PSO for the identification of the SNP-SNP interactions. Furthermore, in order to use these SNP interactions in the susceptibility analysis, we propose an emotional neural network (ENN) to treat SNP interactions as emotional tendency. Different from the normal architecture, just as the emotional brain, this architecture provides a specific path to treat the emotional value, by which the SNP interactions can be considered more quickly and directly. The ENN helps us use the prior knowledge about the SNP interactions and other influence factors together. Finally, the experimental results prove that the proposed BPSOHS_ENN algorithm can detect the informative SNP-SNP interaction and predict the breast cancer risk with a much higher accuracy than existing methods. Xiao Wang, Qinke Peng, and Yue Fan Copyright © 2016 Xiao Wang et al. All rights reserved. The Use of Protein-Protein Interactions for the Analysis of the Associations between PM2.5 and Some Diseases Sun, 08 May 2016 11:57:09 +0000 Nowadays, pollution levels are rapidly increasing all over the world. One of the most important pollutants is PM2.5. It is known that the pollution environment may cause several problems, such as greenhouse effect and acid rain. Among them, the most important problem is that pollutants can induce a number of serious diseases. Some studies have reported that PM2.5 is an important etiologic factor for lung cancer. In this study, we extensively investigate the associations between PM2.5 and 22 disease classes recommended by Goh et al., such as respiratory diseases, cardiovascular diseases, and gastrointestinal diseases. The protein-protein interactions were used to measure the linkage between disease genes and genes that have been reported to be modulated by PM2.5. The results suggest that some diseases, such as diseases related to ear, nose, and throat and gastrointestinal, nutritional, renal, and cardiovascular diseases, are influenced by PM2.5 and some evidences were provided to confirm our results. For example, a total of 18 genes related to cardiovascular diseases are identified to be closely related to PM2.5, and cardiovascular disease relevant gene DSP is significantly related to PM2.5 gene JUP. Qing Zhang, Pei-Wei Zhang, and Yu-Dong Cai Copyright © 2016 Qing Zhang et al. All rights reserved. Bioinformatics Applications in Life Sciences and Technologies Wed, 04 May 2016 12:59:07 +0000 Sílvia A. Sousa, Jorge H. Leitão, Raul C. Martins, João M. Sanches, Jasjit S. Suri, and Alejandro Giorgetti Copyright © 2016 Sílvia A. Sousa et al. All rights reserved. Differential Proteomics Analysis of Colonic Tissues in Patients of Slow Transit Constipation Sat, 30 Apr 2016 14:08:05 +0000 Objective. To investigate and screen the different expression of proteins in STC and normal group with a comparative proteomic approach. Methods. Two-dimensional electrophoresis was applied to separate the proteins in specimens from both 5 STC patients and 5 normal controls. The proteins with statistically significant differential expression between two groups were identified by computer aided image analysis and matrix assisted laser desorption ionization tandem time of flight mass spectrometry (MALDI-TOF-MS). Results. A total of 239 protein spots were identified in the average gel of the normal control and 215 in patients with STC. A total of 197 protein spots were matched and the mean matching rate was 82%. There were 14 protein spots which were expressed with statistically significant differences from others. Of those 14 protein spots, the expression of 12 spots increased markedly, while that of 2 spots decreased significantly. Conclusion. The proteomics expression in colonic specimens of STC patients is statistically significantly different from that of normal control, which may be associated with the pathogenesis of STC. Songlin Wan, Weicheng Liu, Cuiping Tian, Xianghai Ren, Zhao Ding, Qun Qian, Congqing Jiang, and Yunhua Wu Copyright © 2016 Songlin Wan et al. All rights reserved. Discovery of Azurin-Like Anticancer Bacteriocins from Human Gut Microbiome through Homology Modeling and Molecular Docking against the Tumor Suppressor p53 Sat, 30 Apr 2016 13:22:47 +0000 Azurin from Pseudomonas aeruginosa is known anticancer bacteriocin, which can specifically penetrate human cancer cells and induce apoptosis. We hypothesized that pathogenic and commensal bacteria with long term residence in human body can produce azurin-like bacteriocins as a weapon against the invasion of cancers. In our previous work, putative bacteriocins have been screened from complete genomes of 66 dominant bacteria species in human gut microbiota and subsequently characterized by subjecting them as functional annotation algorithms with azurin as control. We have qualitatively predicted 14 putative bacteriocins that possessed functional properties very similar to those of azurin. In this work, we perform a number of quantitative and structure-based analyses including hydrophobic percentage calculation, structural modeling, and molecular docking study of bacteriocins of interest against protein p53, a cancer target. Finally, we have identified 8 putative bacteriocins that bind p53 in a same manner as p28-azurin and azurin, in which 3 peptides (p1seq16, p2seq20, and p3seq24) shared with our previous study and 5 novel ones (p1seq09, p2seq05, p2seq08, p3seq02, and p3seq17) discovered in the first time. These bacteriocins are suggested for further in vitro tests in different neoplastic line cells. Chuong Nguyen and Van Duy Nguyen Copyright © 2016 Chuong Nguyen and Van Duy Nguyen. All rights reserved. Predicting Subcellular Localization of Apoptosis Proteins Combining GO Features of Homologous Proteins and Distance Weighted KNN Classifier Sun, 24 Apr 2016 07:09:15 +0000 Apoptosis proteins play a key role in maintaining the stability of organism; the functions of apoptosis proteins are related to their subcellular locations which are used to understand the mechanism of programmed cell death. In this paper, we utilize GO annotation information of apoptosis proteins and their homologous proteins retrieved from GOA database to formulate feature vectors and then combine the distance weighted KNN classification algorithm with them to solve the data imbalance problem existing in CL317 data set to predict subcellular locations of apoptosis proteins. It is found that the number of homologous proteins can affect the overall prediction accuracy. Under the optimal number of homologous proteins, the overall prediction accuracy of our method on CL317 data set reaches 96.8% by Jackknife test. Compared with other existing methods, it shows that our proposed method is very effective and better than others for predicting subcellular localization of apoptosis proteins. Xiao Wang, Hui Li, Qiuwen Zhang, and Rong Wang Copyright © 2016 Xiao Wang et al. All rights reserved. Predicted 3D Model of the Rabies Virus Glycoprotein Trimer Sun, 24 Apr 2016 06:03:45 +0000 The RABVG ectodomain is a homotrimer, and trimers are often called spikes. They are responsible for the attachment of the virus through the interaction with nicotinic acetylcholine receptors, neural cell adhesion molecule (NCAM), and the p75 neurotrophin receptor (p75NTR). This makes them relevant in viral pathogenesis. The antigenic structure differs significantly between the trimers and monomers. Surfaces rich in hydrophobic amino acids are important for trimer stabilization in which the C-terminal of the ectodomain plays an important role; to understand these interactions between the G proteins, a mechanistic study of their functions was performed with a molecular model of G protein in its trimeric form. This verified its 3D conformation. The molecular modeling of G protein was performed by a I-TASSER server and was evaluated via a Rachamandran plot and ERRAT program obtained 84.64% and 89.9% of the residues in the favorable regions and overall quality factor, respectively. The molecular dynamics simulations were carried out on RABVG trimer at 310 K. From these theoretical studies, we retrieved the RMSD values from Cα atoms to assess stability. Preliminary model of G protein of rabies virus stable at 12 ns with molecular dynamics was obtained. Bastida-González Fernando, Celaya-Trejo Yersin, Correa-Basurto José, and Zárate-Segura Paola Copyright © 2016 Bastida-González Fernando et al. All rights reserved. A Comprehensive Curation Shows the Dynamic Evolutionary Patterns of Prokaryotic CRISPRs Mon, 18 Apr 2016 13:31:39 +0000 Motivation. Clustered regularly interspaced short palindromic repeat (CRISPR) is a genetic element with active regulation roles for foreign invasive genes in the prokaryotic genomes and has been engineered to work with the CRISPR-associated sequence (Cas) gene Cas9 as one of the modern genome editing technologies. Due to inconsistent definitions, the existing CRISPR detection programs seem to have missed some weak CRISPR signals. Results. This study manually curates all the currently annotated CRISPR elements in the prokaryotic genomes and proposes 95 updates to the annotations. A new definition is proposed to cover all the CRISPRs. The comprehensive comparison of CRISPR numbers on the taxonomic levels of both domains and genus shows high variations for closely related species even in the same genus. The detailed investigation of how CRISPRs are evolutionarily manipulated in the 8 completely sequenced species in the genus Thermoanaerobacter demonstrates that transposons act as a frequent tool for splitting long CRISPRs into shorter ones along a long evolutionary history. Guoqin Mai, Ruiquan Ge, Guoquan Sun, Qinghan Meng, and Fengfeng Zhou Copyright © 2016 Guoqin Mai et al. All rights reserved. The Occurrence of Genetic Alterations during the Progression of Breast Carcinoma Thu, 14 Apr 2016 09:28:52 +0000 The interrelationship among genetic variations between the developing process of carcinoma and the order of occurrence has not been completely understood. Interpreting the mechanisms of copy number variation (CNV) is absolutely necessary for understanding the etiology of genetic disorders. Oncogenetic tree is a special phylogenetic tree inferential pictorial representation of oncogenesis. In our present study, we constructed oncogenetic tree to imitate the occurrence of genetic and cytogenetic alterations in human breast cancer. The oncogenetic tree model was built on CNV of ErbB2, AKT2, KRAS, PIK3CA, PTEN, and CCND1 genes in 963 cases of tumors with sequencing and CNA data of human breast cancer from TCGA. Results from the oncogenetic tree model indicate that ErbB2 copy number variation is the frequent early event of human breast cancer. The oncogenetic tree model based on the phylogenetic tree is a type of mathematical model that may eventually provide a better way to understand the process of oncogenesis. Xiao-Chen Li, Chenglin Liu, Tao Huang, and Yang Zhong Copyright © 2016 Xiao-Chen Li et al. All rights reserved. Methylation Status of SP1 Sites within miR-23a-27a-24-2 Promoter Region Influences Laryngeal Cancer Cell Proliferation and Apoptosis Wed, 23 Mar 2016 12:17:47 +0000 DNA methylation plays critical roles in regulation of microRNA expression and function. miR-23a-27a-24-2 cluster has various functions and aberrant expression of the cluster is a common event in many cancers. However, whether DNA methylation influences the cluster expression and function is not reported. Here we found a CG-rich region spanning two SP1 sites in the cluster promoter region. The SP1 sites in the cluster were demethylated and methylated in Hep2 cells and HEK293 cells, respectively. Meanwhile, the cluster was significantly upregulated and downregulated in Hep2 cells and HEK293 cells, respectively. The SP1 sites were remethylated and the cluster was significantly downregulated in Hep2 cells into which methyl donor, S-adenosyl-L-methionine, was introduced. Moreover, S-adenosyl-L-methionine significantly increased Hep2 cell viability and repressed Hep2 cell early apoptosis. We also found that construct with two SP1 sites had highest luciferase activity and SP1 specifically bound the gene cluster promoter in vitro. We conclude that demethylated SP1 sites in miR-23a-27a-24-2 cluster upregulate the cluster expression, leading to proliferation promotion and early apoptosis inhibition in laryngeal cancer cells. Ye Wang, Zhao-Xiong Zhang, Sheng Chen, Guang-Bin Qiu, Zhen-Ming Xu, and Wei-Neng Fu Copyright © 2016 Ye Wang et al. All rights reserved. -Index for Differentiating Complex Dynamic Traits Tue, 15 Mar 2016 16:43:19 +0000 While it is a daunting challenge in current biology to understand how the underlying network of genes regulates complex dynamic traits, functional mapping, a tool for mapping quantitative trait loci (QTLs) and single nucleotide polymorphisms (SNPs), has been applied in a variety of cases to tackle this challenge. Though useful and powerful, functional mapping performs well only when one or more model parameters are clearly responsible for the developmental trajectory, typically being a logistic curve. Moreover, it does not work when the curves are more complex than that, especially when they are not monotonic. To overcome this inadaptability, we therefore propose a mathematical-biological concept and measurement, -index (earliness-index), which cumulatively measures the earliness degree to which a variable (or a dynamic trait) increases or decreases its value. Theoretical proofs and simulation studies show that -index is more general than functional mapping and can be applied to any complex dynamic traits, including those with logistic curves and those with nonmonotonic curves. Meanwhile, -index vector is proposed as well to capture more subtle differences of developmental patterns. Jiandong Qi, Jianfeng Sun, and Jianxin Wang Copyright © 2016 Jiandong Qi et al. All rights reserved. Using Small RNA Deep Sequencing Data to Detect Human Viruses Tue, 15 Mar 2016 13:33:25 +0000 Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans. Fang Wang, Yu Sun, Jishou Ruan, Rui Chen, Xin Chen, Chengjie Chen, Jan F. Kreuze, ZhangJun Fei, Xiao Zhu, and Shan Gao Copyright © 2016 Fang Wang et al. All rights reserved. RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest Tue, 15 Mar 2016 12:13:52 +0000 Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite. Hamid D. Ismail, Ahoi Jones, Jung H. Kim, Robert H. Newman, and Dukka B. KC Copyright © 2016 Hamid D. Ismail et al. All rights reserved. SNP Mining in Functional Genes from Nonmodel Species by Next-Generation Sequencing: A Case of Flowering, Pre-Harvest Sprouting, and Dehydration Resistant Genes in Wheat Mon, 14 Mar 2016 10:48:25 +0000 As plenty of nonmodel plants are without genomic sequences, the combination of molecular technologies and the next generation sequencing (NGS) platform has led to a new approach to study the genetic variations of these plants. Software GATK, SOAPsnp, samtools, and others are often used to deal with the NGS data. In this study, BLAST was applied to call SNPs from 16 mixed functional gene’s sequence data of polyploidy wheat. In total 1.2 million reads were obtained with the average of 7500 reads per genes. To get accurate information, 390,992 pair reads were successfully assembled before aligning to those functional genes. Standalone BLAST tools were used to map assembled sequence to functional genes, respectively. Polynomial fitting was applied to find the suitable minor allele frequency (MAF) threshold at 6% for assembled reads of each functional gene. SNPs accuracy form assembled reads, pretrimmed reads, and original reads were compared, which declared that SNPs mined from the assembled reads were more reliable than others. It was also demonstrated that mixed samples’ NGS sequences and then analysis by BLAST were an effective, low-cost, and accurate way to mine SNPs for nonmodel species. Assembled reads and polynomial fitting threshold were recommended for more accurate SNPs target. Zhong-Xu Chen, Mei Deng, and Ji-Rui Wang Copyright © 2016 Zhong-Xu Chen et al. All rights reserved. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues Thu, 10 Mar 2016 08:27:20 +0000 Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields. Song Guo, Chunhua Liu, Peng Zhou, and Yanling Li Copyright © 2016 Song Guo et al. All rights reserved. Treating Diabetes Mellitus: Pharmacophore Based Designing of Potential Drugs from Gymnema sylvestre against Insulin Receptor Protein Sun, 28 Feb 2016 14:16:49 +0000 Diabetes mellitus (DM) is one of the most prevalent metabolic disorders which can affect the quality of life severely. Injectable insulin is currently being used to treat DM which is mainly associated with patient inconvenience. Small molecules that can act as insulin receptor (IR) agonist would be better alternatives to insulin injection. Herein, ten bioactive small compounds derived from Gymnema sylvestre (G. sylvestre) were chosen to determine their IR binding affinity and ADMET properties using a combined approach of molecular docking study and computational pharmacokinetic elucidation. Designing structural analogues were also performed for the compounds associated with toxicity and less IR affinity. Among the ten parent compounds, six were found to have significant pharmacokinetic properties with considerable binding affinity towards IR while four compounds were associated with toxicity and less IR affinity. Among the forty structural analogues, four compounds demonstrated considerably increased binding affinity towards IR and less toxicity compared with parent compounds. Finally, molecular interaction analysis revealed that six parent compounds and four analogues interact with the active site amino acids of IR. So this study would be a way to identify new therapeutics and alternatives to insulin for diabetic patients. Mohammad Uzzal Hossain, Md. Arif Khan, S. M. Rakib-Uz-Zaman, Mohammad Tuhin Ali, Md. Saidul Islam, Chaman Ara Keya, and Md. Salimullah Copyright © 2016 Mohammad Uzzal Hossain et al. All rights reserved. Identification of Deleterious Mutations in Myostatin Gene of Rohu Carp (Labeo rohita) Using Modeling and Molecular Dynamic Simulation Approaches Thu, 25 Feb 2016 15:28:07 +0000 The myostatin (MSTN) is a known negative growth regulator of skeletal muscle. The mutated myostatin showed a double-muscular phenotype having a positive significance for the farmed animals. Consequently, adequate information is not available in the teleosts, including farmed rohu carp, Labeo rohita. In the absence of experimental evidence, computational algorithms were utilized in predicting the impact of point mutation of rohu myostatin, especially its structural and functional relationships. The four mutations were generated at different positions (p.D76A, p.Q204P, p.C312Y, and p.D313A) of MSTN protein of rohu. The impacts of each mutant were analyzed using SIFT, I-Mutant 2.0, PANTHER, and PROVEAN, wherein two substitutions (p.D76A and p.Q204P) were predicted as deleterious. The comparative structural analysis of each mutant protein with the native was explored using 3D modeling as well as molecular-dynamic simulation techniques. The simulation showed altered dynamic behaviors concerning RMSD and RMSF, for either p.D76A or p.Q204P substitution, when compared with the native counterpart. Interestingly, incorporated two mutations imposed a significant negative impact on protein structure and stability. The present study provided the first-hand information in identifying possible amino acids, where mutations could be incorporated into MSTN gene of rohu carp including other carps for undertaking further in vivo studies. Kiran Dashrath Rasal, Vemulawada Chakrapani, Swagat Kumar Patra, Shibani D. Mohapatra, Swapnarani Nayak, Sasmita Jena, Jitendra Kumar Sundaray, Pallipuram Jayasankar, and Hirak Kumar Barman Copyright © 2016 Kiran Dashrath Rasal et al. All rights reserved. VGSC: A Web-Based Vector Graph Toolkit of Genome Synteny and Collinearity Wed, 24 Feb 2016 10:02:32 +0000 Background. In order to understand the colocalization of genetic loci amongst species, synteny and collinearity analysis is a frequent task in comparative genomics research. However many analysis software packages are not effective in visualizing results. Problems include lack of graphic visualization, simple representation, or inextensible format of outputs. Moreover, higher throughput sequencing technology requires higher resolution image output. Implementation. To fill this gap, this paper publishes VGSC, the Vector Graph toolkit of genome Synteny and Collinearity, and its online service, to visualize the synteny and collinearity in the common graphical format, including both raster (JPEG, Bitmap, and PNG) and vector graphic (SVG, EPS, and PDF). Result. Users can upload sequence alignments from blast and collinearity relationship from the synteny analysis tools. The website can generate the vector or raster graphical results automatically. We also provide a java-based bytecode binary to enable the command-line execution. Yiqing Xu, Changwei Bi, Guoxin Wu, Suyun Wei, Xiaogang Dai, Tongming Yin, and Ning Ye Copyright © 2016 Yiqing Xu et al. All rights reserved. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification Sun, 14 Feb 2016 14:02:26 +0000 Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data. Yin Wang, Rudong Li, Yuhua Zhou, Zongxin Ling, Xiaokui Guo, Lu Xie, and Lei Liu Copyright © 2016 Yin Wang et al. All rights reserved. Advancements in RNASeqGUI towards a Reproducible Analysis of RNA-Seq Experiments Wed, 10 Feb 2016 13:50:15 +0000 We present the advancements and novelties recently introduced in RNASeqGUI, a graphical user interface that helps biologists to handle and analyse large data collected in RNA-Seq experiments. This work focuses on the concept of reproducible research and shows how it has been incorporated in RNASeqGUI to provide reproducible (computational) results. The novel version of RNASeqGUI combines graphical interfaces with tools for reproducible research, such as literate statistical programming, human readable report, parallel executions, caching, and interactive and web-explorable tables of results. These features allow the user to analyse big datasets in a fast, efficient, and reproducible way. Moreover, this paper represents a proof of concept, showing a simple way to develop computational tools for Life Science in the spirit of reproducible research. Francesco Russo, Dario Righelli, and Claudia Angelini Copyright © 2016 Francesco Russo et al. All rights reserved. A Prediction Model for Membrane Proteins Using Moments Based Features Sun, 07 Feb 2016 15:42:21 +0000 The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. Ahmad Hassan Butt, Sher Afzal Khan, Hamza Jamil, Nouman Rasool, and Yaser Daanial Khan Copyright © 2016 Ahmad Hassan Butt et al. All rights reserved. PIPINO: A Software Package to Facilitate the Identification of Protein-Protein Interactions from Affinity Purification Mass Spectrometry Data Sun, 07 Feb 2016 14:17:44 +0000 The functionality of most proteins is regulated by protein-protein interactions. Hence, the comprehensive characterization of the interactome is the next milestone on the path to understand the biochemistry of the cell. A powerful method to detect protein-protein interactions is a combination of coimmunoprecipitation or affinity purification with quantitative mass spectrometry. Nevertheless, both methods tend to precipitate a high number of background proteins due to nonspecific interactions. To address this challenge the software Protein-Protein-Interaction-Optimizer (PIPINO) was developed to perform an automated data analysis, to facilitate the selection of bona fide binding partners, and to compare the dynamic of interaction networks. In this study we investigated the STAT1 interaction network and its activation dependent dynamics. Stable isotope labeling by amino acids in cell culture (SILAC) was applied to analyze the STAT1 interactome after streptavidin pull-down of biotagged STAT1 from human embryonic kidney 293T cells with and without activation. Starting from more than 2,000 captured proteins 30 potential STAT1 interaction partners were extracted. Interestingly, more than 50% of these were already reported or predicted to bind STAT1. Furthermore, 16 proteins were found to affect the binding behavior depending on STAT1 phosphorylation such as STAT3 or the importin subunits alpha 1 and alpha 6. Stefan Kalkhof, Stefan Schildbach, Conny Blumert, Friedemann Horn, Martin von Bergen, and Dirk Labudde Copyright © 2016 Stefan Kalkhof et al. All rights reserved. Analysis and Identification of Aptamer-Compound Interactions with a Maximum Relevance Minimum Redundancy and Nearest Neighbor Algorithm Wed, 03 Feb 2016 06:40:35 +0000 The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request. ShaoPeng Wang, Yu-Hang Zhang, Jing Lu, Weiren Cui, Jerry Hu, and Yu-Dong Cai Copyright © 2016 ShaoPeng Wang et al. All rights reserved. ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition Thu, 28 Jan 2016 06:46:23 +0000 Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities. Abbas Akkasi, Ekrem Varoğlu, and Nazife Dimililer Copyright © 2016 Abbas Akkasi et al. All rights reserved. Inhibition of DNA Topoisomerase Type IIα (TOP2A) by Mitoxantrone and Its Halogenated Derivatives: A Combined Density Functional and Molecular Docking Study Wed, 27 Jan 2016 09:17:56 +0000 In this study, mitoxantrone and its halogenated derivatives have been designed by density functional theory (DFT) to explore their structural and thermodynamical properties. The performance of these drugs was also evaluated to inhibit DNA topoisomerase type IIα (TOP2A) by molecular docking calculation. Noncovalent interactions play significant role in improving the performance of halogenated drugs. The combined quantum and molecular mechanics calculations revealed that CF3 containing drug shows better preference in inhibiting the TOP2A compared to other modified drugs. Md. Abu Saleh, Md. Solayman, Mohammad Mazharol Hoque, Mohammad A. K. Khan, Mohammed G. Sarwar, and Mohammad A. Halim Copyright © 2016 Md. Abu Saleh et al. All rights reserved. Segmenting Brain Tissues from Chinese Visible Human Dataset by Deep-Learned Features with Stacked Autoencoder Tue, 26 Jan 2016 13:58:34 +0000 Cryosection brain images in Chinese Visible Human (CVH) dataset contain rich anatomical structure information of tissues because of its high resolution (e.g., 0.167 mm per pixel). Fast and accurate segmentation of these images into white matter, gray matter, and cerebrospinal fluid plays a critical role in analyzing and measuring the anatomical structures of human brain. However, most existing automated segmentation methods are designed for computed tomography or magnetic resonance imaging data, and they may not be applicable for cryosection images due to the imaging difference. In this paper, we propose a supervised learning-based CVH brain tissues segmentation method that uses stacked autoencoder (SAE) to automatically learn the deep feature representations. Specifically, our model includes two successive parts where two three-layer SAEs take image patches as input to learn the complex anatomical feature representation, and then these features are sent to Softmax classifier for inferring the labels. Experimental results validated the effectiveness of our method and showed that it outperformed four other classical brain tissue detection strategies. Furthermore, we reconstructed three-dimensional surfaces of these tissues, which show their potential in exploring the high-resolution anatomical structures of human brain. Guangjun Zhao, Xuchu Wang, Yanmin Niu, Liwen Tan, and Shao-Xiang Zhang Copyright © 2016 Guangjun Zhao et al. All rights reserved. Automated Cell Selection Using Support Vector Machine for Application to Spectral Nanocytology Tue, 19 Jan 2016 16:21:10 +0000 Partial wave spectroscopy (PWS) enables quantification of the statistical properties of cell structures at the nanoscale, which has been used to identify patients harboring premalignant tumors by interrogating easily accessible sites distant from location of the lesion. Due to its high sensitivity, cells that are well preserved need to be selected from the smear images for further analysis. To date, such cell selection has been done manually. This is time-consuming, is labor-intensive, is vulnerable to bias, and has considerable inter- and intraoperator variability. In this study, we developed a classification scheme to identify and remove the corrupted cells or debris that are of no diagnostic value from raw smear images. The slide of smear sample is digitized by acquiring and stitching low-magnification transmission. Objects are then extracted from these images through segmentation algorithms. A training-set is created by manually classifying objects as suitable or unsuitable. A feature-set is created by quantifying a large number of features for each object. The training-set and feature-set are used to train a selection algorithm using Support Vector Machine (SVM) classifiers. We show that the selection algorithm achieves an error rate of 93% with a sensitivity of 95%. Qin Miao, Justin Derbas, Aya Eid, Hariharan Subramanian, and Vadim Backman Copyright © 2016 Qin Miao et al. All rights reserved. Identification of Novel RD1 Antigens and Their Combinations for Diagnosis of Sputum Smear−/Culture+ TB Patients Mon, 18 Jan 2016 13:39:48 +0000 Rapid and accurate diagnosis of pulmonary tuberculosis (PTB) is an unresolved problem worldwide, especially for sputum smear− (S−) cases. In this study, five antigen genes including Rv3871, Rv3874, Rv3875, Rv3876, and Rv3879 were cloned from Mycobacterium tuberculosis (Mtb) RD1 and overexpressed to generate antigen fragments. These antigens and their combinations were investigated for PTB serodiagnosis. 298 serum samples were collected from active PTB patients, including 117 sputum smear+ (S+) and sputum culture+ (C+) cases, 101 S−/C+ cases, and 80 S−/C− cases. The serum IgG levels of the five antigens were measured by ELISA. Based on IgG levels, the sensitivity/specificity of Rv3871, Rv3874, Rv3875, Rv3876, and Rv3879 for PTB detection was 81.21%/74.74%, 63.09%/94.78%, 32.21%/87.37%, 62.42%/85.26%, and 83.56%/83.16%, respectively. Furthermore, the optimal result for PTB diagnosis was achieved by combining antigens Rv3871, Rv3876, and Rv3879. In addition, the IgG levels of Rv3871, Rv3876, and Rv3879 were found to be higher in S−/C+ PTB patients than in other PTB populations. More importantly, combination of the three antigens demonstrated superior diagnostic performance for both S−/C+ and S−/C− PTB. In conclusion, the combination of Rv3871, Rv3876, and Rv3879 induced higher IgG response in sputum S−/C+ PTB patients and represents a promising biomarker combination for diagnosing of PTB. Zhiqiang Liu, Shuang Qie, Lili Li, Bingshui Xiu, Xiqin Yang, Zhenhua Dai, Xuhui Zhang, Cuimi Duan, Haiping Que, Ping Zhao, Heather Johnson, Heqiu Zhang, and Xiaoyan Feng Copyright © 2016 Zhiqiang Liu et al. All rights reserved. The Subcellular Localization and Functional Analysis of Fibrillarin2, a Nucleolar Protein in Nicotiana benthamiana Sun, 17 Jan 2016 10:20:02 +0000 Nucleolar proteins play important roles in plant cytology, growth, and development. Fibrillarin2 is a nucleolar protein of Nicotiana benthamiana (N. benthamiana). Its cDNA was amplified by RT-PCR and inserted into expression vector pEarley101 labeled with yellow fluorescent protein (YFP). The fusion protein was localized in the nucleolus and Cajal body of leaf epidermal cells of N. benthamiana. The N. benthamiana fibrillarin2 (NbFib2) protein has three functional domains (i.e., glycine and arginine rich domain, RNA-binding domain, and α-helical domain) and a nuclear localization signal (NLS) in C-terminal. The protein 3D structure analysis predicted that NbFib2 is an α/β protein. In addition, the virus induced gene silencing (VIGS) approach was used to determine the function of NbFib2. Our results showed that symptoms including growth retardation, organ deformation, chlorosis, and necrosis appeared in NbFib2-silenced N. benthamiana. Luping Zheng, Jinai Yao, Fangluan Gao, Lin Chen, Chao Zhang, Lingli Lian, Liyan Xie, Zujian Wu, and Lianhui Xie Copyright © 2016 Luping Zheng et al. All rights reserved. Modification of the Sweetness and Stability of Sweet-Tasting Protein Monellin by Gene Mutation and Protein Engineering Sun, 10 Jan 2016 10:31:15 +0000 Natural sweet protein monellin has a high sweetness and low calorie, suggesting its potential in food applications. However, due to its low heat and acid resistance, the application of monellin is limited. In this study, we show that the thermostability of monellin can be improved with no sweetness decrease by means of sequence, structure analysis, and site-directed mutagenesis. We analyzed residues located in the α-helix as well as an ionizable residue C41. Of the mutants investigated, the effects of E23A and C41A mutants were most remarkable. The former displayed significantly improved thermal stability, while its sweetness was not changed. The mutated protein was stable after 30 min incubation at 85°C. The latter showed increased sweetness and slight improvement of thermostability. Furthermore, we found that most mutants enhancing the thermostability of the protein were distributed at the two ends of α-helix. Molecular biophysics analysis revealed that the state of buried ionizable residues may account for the modulated properties of mutated proteins. Our results prove that the properties of sweet protein monellin can be modified by means of bioinformatics analysis, gene manipulation, and protein modification, highlighting the possibility of designing novel effective sweet proteins based on structure-function relationships. Qiulei Liu, Lei Li, Liu Yang, Tianming Liu, Chenggu Cai, and Bo Liu Copyright © 2016 Qiulei Liu et al. All rights reserved. Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model Tue, 29 Dec 2015 14:36:27 +0000 Recent study shows that long noncoding RNAs (lncRNAs) are participating in diverse biological processes and complex diseases. However, at present the functions of lncRNAs are still rarely known. In this study, we propose a network-based computational method, which is called lncRNA-protein interaction prediction based on Heterogeneous Network Model (LPIHN), to predict the potential lncRNA-protein interactions. First, we construct a heterogeneous network by integrating the lncRNA-lncRNA similarity network, lncRNA-protein interaction network, and protein-protein interaction (PPI) network. Then, a random walk with restart is implemented on the heterogeneous network to infer novel lncRNA-protein interactions. The leave-one-out cross validation test shows that our approach can achieve an AUC value of 96.0%. Some lncRNA-protein interactions predicted by our method have been confirmed in recent research or database, indicating the efficiency of LPIHN to predict novel lncRNA-protein interactions. Ao Li, Mengqu Ge, Yao Zhang, Chen Peng, and Minghui Wang Copyright © 2015 Ao Li et al. All rights reserved. Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords Thu, 10 Dec 2015 06:31:19 +0000 For the automatic extraction of protein-protein interaction information from scientific articles, a machine learning approach is useful. The classifier is generated from training data represented using several features to decide whether a protein pair in each sentence has an interaction. Such a specific keyword that is directly related to interaction as “bind” or “interact” plays an important role for training classifiers. We call it a dominant keyword that affects the capability of the classifier. Although it is important to identify the dominant keywords, whether a keyword is dominant depends on the context in which it occurs. Therefore, we propose a method for predicting whether a keyword is dominant for each instance. In this method, a keyword that derives imbalanced classification results is tentatively assumed to be a dominant keyword initially. Then the classifiers are separately trained from the instance with and without the assumed dominant keywords. The validity of the assumed dominant keyword is evaluated based on the classification results of the generated classifiers. The assumption is updated by the evaluation result. Repeating this process increases the prediction accuracy of the dominant keyword. Our experimental results using five corpora show the effectiveness of our proposed method with dominant keyword prediction. Shun Koyabu, Thi Thanh Thuy Phan, and Takenao Ohkawa Copyright © 2015 Shun Koyabu et al. All rights reserved. Cofunctional Subpathways Were Regulated by Transcription Factor with Common Motif, Common Family, or Common Tissue Tue, 24 Nov 2015 14:10:30 +0000 Dissecting the characteristics of the transcription factor (TF) regulatory subpathway is helpful for understanding the TF underlying regulatory function in complex biological systems. To gain insight into the influence of TFs on their regulatory subpathways, we constructed a global TF-subpathways network (TSN) to analyze systematically the regulatory effect of common-motif, common-family, or common-tissue TFs on subpathways. We performed cluster analysis to show that the common-motif, common-family, or common-tissue TFs that regulated the same pathway classes tended to cluster together and contribute to the same biological function that led to disease initiation and progression. We analyzed the Jaccard coefficient to show that the functional consistency of subpathways regulated by the TF pairs with common motif, common family, or common tissue was significantly greater than the random TF pairs at the subpathway level, pathway level, and pathway class level. For example, HNF4A (hepatocyte nuclear factor 4, alpha) and NR1I3 (nuclear receptor subfamily 1, group I, member 3) were a pair of TFs with common motif, common family, and common tissue. They were involved in drug metabolism pathways and were liver-specific factors required for physiological transcription. In short, we inferred that the cofunctional subpathways were regulated by common-motif, common-family, or common-tissue TFs. Fei Su, Desi Shang, Yanjun Xu, Li Feng, Haixiu Yang, Baoquan Liu, Shengyang Su, Lina Chen, and Xia Li Copyright © 2015 Fei Su et al. All rights reserved. MatPred: Computational Identification of Mature MicroRNAs within Novel Pre-MicroRNAs Mon, 23 Nov 2015 09:23:36 +0000 Background. MicroRNAs (miRNAs) are short noncoding RNAs integral for regulating gene expression at the posttranscriptional level. However, experimental methods often fall short in finding miRNAs expressed at low levels or in specific tissues. While several computational methods have been developed for predicting the localization of mature miRNAs within the precursor transcript, the prediction accuracy requires significant improvement. Methodology/Principal Findings. Here, we present MatPred, which predicts mature miRNA candidates within novel pre-miRNA transcripts. In addition to the relative locus of the mature miRNA within the pre-miRNA hairpin loop and minimum free energy, we innovatively integrated features that describe the nucleotide-specific RNA secondary structure characteristics. In total, 94 features were extracted from the mature miRNA loci and flanking regions. The model was trained based on a radial basis function kernel/support vector machine (RBF/SVM). Our method can predict precise locations of mature miRNAs, as affirmed by experimentally verified human pre-miRNAs or pre-miRNAs candidates, thus achieving a significant advantage over existing methods. Conclusions. MatPred is a highly effective method for identifying mature miRNAs within novel pre-miRNA transcripts. Our model significantly outperformed three other widely used existing methods. Such processing prediction methods may provide important insight into miRNA biogenesis. Jin Li, Ying Wang, Lei Wang, Weixing Feng, Kuan Luan, Xuefeng Dai, Chengzhen Xu, Xianglian Meng, Qiushi Zhang, and Hong Liang Copyright © 2015 Jin Li et al. All rights reserved. Corrigendum to “Information-Theoretical Quantifier of Brain Rhythm Based on Data-Driven Multiscale Representation” Thu, 12 Nov 2015 09:43:55 +0000 Young-Seok Choi and Xiaofeng Jia Copyright © 2015 Young-Seok Choi and Xiaofeng Jia. All rights reserved. Improved Pre-miRNA Classification by Reducing the Effect of Class Imbalance Tue, 10 Nov 2015 13:09:26 +0000 MicroRNAs (miRNAs) play important roles in the diverse biological processes of animals and plants. Although the prediction methods based on machine learning can identify nonhomologous and species-specific miRNAs, they suffered from severe class imbalance on real and pseudo pre-miRNAs. We propose a pre-miRNA classification method based on cost-sensitive ensemble learning and refer to it as MiRNAClassify. Through a series of iterations, the information of all the positive and negative samples is completely exploited. In each iteration, a new classification instance is trained by the equal number of positive and negative samples. In this way, the negative effect of class imbalance is efficiently relieved. The new instance primarily focuses on those samples that are easy to be misclassified. In addition, the positive samples are assigned higher cost weight than the negative samples. MiRNAClassify is compared with several state-of-the-art methods and some well-known classification models by testing the datasets about human, animal, and plant. The result of cross validation indicates that MiRNAClassify significantly outperforms other methods and models. In addition, the newly added pre-miRNAs are used to further evaluate the ability of these methods to discover novel pre-miRNAs. MiRNAClassify still achieves consistently superior performance and can discover more pre-miRNAs. Yingli Zhong, Ping Xuan, Ke Han, Weiping Zhang, and Jianzhong Li Copyright © 2015 Yingli Zhong et al. All rights reserved. Comparative Genome and Network Centrality Analysis to Identify Drug Targets of Mycobacterium tuberculosis H37Rv Thu, 05 Nov 2015 13:16:24 +0000 Potential drug targets of Mycobacterium tuberculosis H37Rv were identified through systematically integrated comparative genome and network centrality analysis. The comparative analysis of the complete genome of Mycobacterium tuberculosis H37Rv against Database of Essential Genes (DEG) yields a list of proteins which are essential for the growth and survival of the pathogen. Those proteins which are nonhomologous with human were selected. The resulting proteins were then prioritized by using the four network centrality measures: degree, closeness, betweenness, and eigenvector. Proteins whose centrality value is close to the centre of gravity of the interactome network were proposed as a final list of potential drug targets for the pathogen. The use of an integrated approach is believed to increase the success of the drug target identification process. For the purpose of validation, selective comparisons have been made among the proposed targets and previously identified drug targets by various other methods. About half of these proteins have been already reported as potential drug targets. We believe that the identified proteins will be an important input to experimental study which in the way could save considerable amount of time and cost of drug target discovery. Tilahun Melak and Sunita Gakkhar Copyright © 2015 Tilahun Melak and Sunita Gakkhar. All rights reserved. The Roles of miR-26, miR-29, and miR-203 in the Silencing of the Epigenetic Machinery during Melanocyte Transformation Wed, 04 Nov 2015 08:28:53 +0000 The epigenetic marks located throughout the genome exhibit great variation between normal and transformed cancer cells. While normal cells contain hypomethylated CpG islands near gene promoters and hypermethylated repetitive DNA, the opposite pattern is observed in cancer cells. Recently, it has been reported that alteration in the microenvironment of melanocyte cells, such as substrate adhesion blockade, results in the selection of anoikis-resistant cells, which have tumorigenic characteristics. Melanoma cells obtained through this model show an altered epigenetic pattern, which represents one of the first events during the melanocytes malignant transformation. Because microRNAs are involved in controlling components of the epigenetic machinery, the aim of this work was to evaluate the potential association between the expression of miR-203, miR-26, and miR-29 family members and the genes Dnmt3a, Dnmt3b, Mecp2, and Ezh2 during cells transformation. Our results show that microRNAs and their validated or predicted targets are inversely expressed, indicating that these molecules are involved in epigenetic reprogramming. We also show that miR-203 downregulates Dnmt3b in mouse melanocyte cells. In addition, treatment with 5-aza-CdR promotes the expression of miR-26 and miR-29 in a nonmetastatic melanoma cell line. Considering the occurrence of CpG islands near the miR-26 and miR-29 promoters, these data suggest that they might be epigenetically regulated in cancer. Cláudia Regina Gasque Schoof, Alberto Izzotti, Miriam Galvonas Jasiulionis, and Luciana dos Reis Vasques Copyright © 2015 Cláudia Regina Gasque Schoof et al. All rights reserved. Mining for Candidate Genes Related to Pancreatic Cancer Using Protein-Protein Interactions and a Shortest Path Approach Tue, 03 Nov 2015 09:37:59 +0000 Pancreatic cancer (PC) is a highly malignant tumor derived from pancreas tissue and is one of the leading causes of death from cancer. Its molecular mechanism has been partially revealed by validating its oncogenes and tumor suppressor genes; however, the available data remain insufficient for medical workers to design effective treatments. Large-scale identification of PC-related genes can promote studies on PC. In this study, we propose a computational method for mining new candidate PC-related genes. A large network was constructed using protein-protein interaction information, and a shortest path approach was applied to mine new candidate genes based on validated PC-related genes. In addition, a permutation test was adopted to further select key candidate genes. Finally, for all discovered candidate genes, the likelihood that the genes are novel PC-related genes is discussed based on their currently known functions. Fei Yuan, Yu-Hang Zhang, Sibao Wan, ShaoPeng Wang, and Xiang-Yin Kong Copyright © 2015 Fei Yuan et al. All rights reserved. Big Data and Network Biology 2015 Sun, 01 Nov 2015 07:06:50 +0000 Shigehiko Kanaya, Md. Altaf-Ul-Amin, Samuel K. Kiboi, and Farit Mochamad Afendi Copyright © 2015 Shigehiko Kanaya et al. All rights reserved. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species Thu, 29 Oct 2015 13:34:56 +0000 Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification. Deborah Galpert, Sara del Río, Francisco Herrera, Evys Ancede-Gallardo, Agostinho Antunes, and Guillermin Agüero-Chapin Copyright © 2015 Deborah Galpert et al. All rights reserved. Frontiers in Integrative Genomics and Translational Bioinformatics Wed, 28 Oct 2015 13:36:26 +0000 Zhongming Zhao, Victor X. Jin, Yufei Huang, Chittibabu Guda, and Jianhua Ruan Copyright © 2015 Zhongming Zhao et al. All rights reserved. Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence Wed, 28 Oct 2015 07:26:10 +0000 Increasing demand for the knowledge about protein-protein interactions (PPIs) is promoting the development of methods for predicting protein interaction network. Although high-throughput technologies have generated considerable PPIs data for various organisms, it has inevitable drawbacks such as high cost, time consumption, and inherently high false positive rate. For this reason, computational methods are drawing more and more attention for predicting PPIs. In this study, we report a computational method for predicting PPIs using the information of protein sequences. The main improvements come from adopting a novel protein sequence representation by using discrete cosine transform (DCT) on substitution matrix representation (SMR) and from using weighted sparse representation based classifier (WSRC). When performing on the PPIs dataset of Yeast, Human, and H. pylori, we got excellent results with average accuracies as high as 96.28%, 96.30%, and 86.74%, respectively, significantly better than previous methods. Promising results obtained have proven that the proposed method is feasible, robust, and powerful. To further evaluate the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier. Extensive experiments were also performed in which we used Yeast PPIs samples as training set to predict PPIs of other five species datasets. Yu-An Huang, Zhu-Hong You, Xin Gao, Leon Wong, and Lirong Wang Copyright © 2015 Yu-An Huang et al. All rights reserved. Systematic Analysis and Prediction of In Situ Cross Talk of O-GlcNAcylation and Phosphorylation Tue, 27 Oct 2015 06:51:33 +0000 Reversible posttranslational modification (PTM) plays a very important role in biological process by changing properties of proteins. As many proteins are multiply modified by PTMs, cross talk of PTMs is becoming an intriguing topic and draws much attention. Currently, lots of evidences suggest that the PTMs work together to accomplish a specific biological function. However, both the general principles and underlying mechanism of PTM crosstalk are elusive. In this study, by using large-scale datasets we performed evolutionary conservation analysis, gene ontology enrichment, motif extraction of proteins with cross talk of O-GlcNAcylation and phosphorylation cooccurring on the same residue. We found that proteins with in situ O-GlcNAc/Phos cross talk were significantly enriched in some specific gene ontology terms and no obvious evolutionary pressure was observed. Moreover, 3 functional motifs associated with O-GlcNAc/Phos sites were extracted. We further used sequence features and GO features to predict O-GlcNAc/Phos cross talk sites based on phosphorylated sites and O-GlcNAcylated sites separately by the use of SVM model. The AUC of classifier based on phosphorylated sites is 0.896 and the other classifier based on GlcNAcylated sites is 0.843. Both classifiers achieved a relatively better performance compared with other existing methods. Heming Yao, Ao Li, and Minghui Wang Copyright © 2015 Heming Yao et al. All rights reserved. JPPRED: Prediction of Types of J-Proteins from Imbalanced Data Using an Ensemble Learning Method Mon, 26 Oct 2015 06:16:47 +0000 Different types of J-proteins perform distinct functions in chaperone processes and diseases development. Accurate identification of types of J-proteins will provide significant clues to reveal the mechanism of J-proteins and contribute to developing drugs for diseases. In this study, an ensemble predictor called JPPRED for J-protein prediction is proposed with hybrid features, including split amino acid composition (SAAC), pseudo amino acid composition (PseAAC), and position specific scoring matrix (PSSM). To deal with the imbalanced benchmark dataset, the synthetic minority oversampling technique (SMOTE) and undersampling technique are applied. The average sensitivity of JPPRED based on above-mentioned individual feature spaces lies in the range of 0.744–0.851, indicating the discriminative power of these features. In addition, JPPRED yields the highest average sensitivity of 0.875 using the hybrid feature spaces of SAAC, PseAAC, and PSSM. Compared to individual base classifiers, JPPRED obtains more balanced and better performance for each type of J-proteins. To evaluate the prediction performance objectively, JPPRED is compared with previous study. Encouragingly, JPPRED obtains balanced performance for each type of J-proteins, which is significantly superior to that of the existing method. It is anticipated that JPPRED can be a potential candidate for J-protein prediction. Lina Zhang, Chengjin Zhang, Rui Gao, and Runtao Yang Copyright © 2015 Lina Zhang et al. All rights reserved. Identification of Gene Expression Pattern Related to Breast Cancer Survival Using Integrated TCGA Datasets and Genomic Tools Tue, 20 Oct 2015 14:15:40 +0000 Several large-scale human cancer genomics projects such as TCGA offered huge genomic and clinical data for researchers to obtain meaningful genomics alterations which intervene in the development and metastasis of the tumor. A web-based TCGA data analysis platform called TCGA4U was developed in this study. TCGA4U provides a visualization solution for this study to illustrate the relationship of these genomics alternations with clinical data. A whole genome screening of the survival related gene expression patterns in breast cancer was studied. The gene list that impacts the breast cancer patient survival was divided into two patterns. Gene list of each of these patterns was separately analyzed on DAVID. The result showed that mitochondrial ribosomes play a more crucial role in the cancer development. We also reported that breast cancer patients with low HSPA2 expression level had shorter overall survival time. This is widely different to findings of HSPA2 expression pattern in other cancer types. TCGA4U provided a new perspective for the TCGA datasets. We believe it can inspire more biomedical researchers to study and explain the genomic alterations in cancer development and discover more targeted therapies to help more cancer patients. Zhenzhen Huang, Huilong Duan, and Haomin Li Copyright © 2015 Zhenzhen Huang et al. All rights reserved. Proteomic Study to Survey the CIGB-552 Antitumor Effect Tue, 20 Oct 2015 11:43:43 +0000 CIGB-552 is a cell-penetrating peptide that exerts in vitro and in vivo antitumor effect on cancer cells. In the present work, the mechanism involved in such anticancer activity was studied using chemical proteomics and expression-based proteomics in culture cancer cell lines. CIGB-552 interacts with at least 55 proteins, as determined by chemical proteomics. A temporal differential proteomics based on iTRAQ quantification method was performed to identify CIGB-552 modulated proteins. The proteomic profile includes 72 differentially expressed proteins in response to CIGB-552 treatment. Proteins related to cell proliferation and apoptosis were identified by both approaches. In line with previous findings, proteomic data revealed that CIGB-552 triggers the inhibition of NF-κB signaling pathway. Furthermore, proteins related to cell invasion were differentially modulated by CIGB-552 treatment suggesting new potentialities of CIGB-552 as anticancer agent. Overall, the current study contributes to a better understanding of the antitumor action mechanism of CIGB-552. Arielis Rodríguez-Ulloa, Jeovanis Gil, Yassel Ramos, Lilian Hernández-Álvarez, Lisandra Flores, Brizaida Oliva, Dayana García, Aniel Sánchez-Puente, Alexis Musacchio-Lasa, Jorge Fernández-de-Cossio, Gabriel Padrón, Luis J. González López, Vladimir Besada, and Maribel Guerra-Vallespí Copyright © 2015 Arielis Rodríguez-Ulloa et al. All rights reserved. Personal Verification/Identification via Analysis of the Peripheral ECG Leads: Influence of the Personal Health Status on the Accuracy Mon, 19 Oct 2015 14:11:32 +0000 Traditional means for identity validation (PIN codes, passwords), and physiological and behavioral biometric characteristics (fingerprint, iris, and speech) are susceptible to hacker attacks and/or falsification. This paper presents a method for person verification/identification based on correlation of present-to-previous limb ECG leads: I (), II (), calculated from them first principal ECG component (), linear and nonlinear combinations between , , and . For the verification task, the one-to-one scenario is applied and threshold values for , , and and their combinations are derived. The identification task supposes one-to-many scenario and the tested subject is identified according to the maximal correlation with a previously recorded ECG in a database. The population based ECG-ILSA database of 540 patients (147 healthy subjects, 175 patients with cardiac diseases, and 218 with hypertension) has been considered. In addition a common reference PTB dataset (14 healthy individuals) with short time interval between the two acquisitions has been taken into account. The results on ECG-ILSA database were satisfactory with healthy people, and there was not a significant decrease in nonhealthy patients, demonstrating the robustness of the proposed method. With PTB database, the method provides an identification accuracy of 92.9% and a verification sensitivity and specificity of 100% and 89.9%. Irena Jekova and Giovanni Bortolan Copyright © 2015 Irena Jekova and Giovanni Bortolan. All rights reserved. Building Integrated Ontological Knowledge Structures with Efficient Approximation Algorithms Tue, 13 Oct 2015 13:54:53 +0000 The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms. Yang Xiang and Sarath Chandra Janga Copyright © 2015 Yang Xiang and Sarath Chandra Janga. All rights reserved. Predicting Drug-Target Interactions via Within-Score and Between-Score Mon, 12 Oct 2015 13:48:13 +0000 Network inference and local classification models have been shown to be useful in predicting newly potential drug-target interactions (DTIs) for assisting in drug discovery or drug repositioning. The idea is to represent drugs, targets, and their interactions as a bipartite network or an adjacent matrix. However, existing methods have not yet addressed appropriately several issues, such as the powerless inference in the case of isolated subnetworks, the biased classifiers derived from insufficient positive samples, the need of training a number of local classifiers, and the unavailable relationship between known DTIs and unapproved drug-target pairs (DTPs). Designing more effective approaches to address those issues is always desirable. In this paper, after presenting better drug similarities and target similarities, we characterize each DTP as a feature vector of within-scores and between-scores so as to hold the following superiorities: (1) a uniform vector of all types of DTPs, (2) only one global classifier with less bias benefiting from adequate positive samples, and (3) more importantly, the visualized relationship between known DTIs and unapproved DTPs. The effectiveness of our approach is finally demonstrated via comparing with other popular methods under cross validation and predicting potential interactions for DTPs under the validation in existing databases. Jian-Yu Shi, Zun Liu, Hui Yu, and Yong-Jun Li Copyright © 2015 Jian-Yu Shi et al. All rights reserved. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection Mon, 12 Oct 2015 11:18:29 +0000 The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information. Xin Ma, Jing Guo, and Xiao Sun Copyright © 2015 Xin Ma et al. All rights reserved. RNAseq by Total RNA Library Identifies Additional RNAs Compared to Poly(A) RNA Library Mon, 12 Oct 2015 09:19:06 +0000 The most popular RNA library used for RNA sequencing is the poly(A) captured RNA library. This library captures RNA based on the presence of poly(A) tails at the 3′ end. Another type of RNA library for RNA sequencing is the total RNA library which differs from the poly(A) library by capture method and price. The total RNA library costs more and its capture of RNA is not dependent on the presence of poly(A) tails. In practice, only ribosomal RNAs and small RNAs are washed out in the total RNA library preparation. To evaluate the ability of detecting RNA for both RNA libraries we designed a study using RNA sequencing data of the same two breast cancer cell lines from both RNA libraries. We found that the RNA expression values captured by both RNA libraries were highly correlated. However, the number of RNAs captured was significantly higher for the total RNA library. Furthermore, we identify several subsets of protein coding RNAs that were not captured efficiently by the poly(A) library. One of the most noticeable is the histone-encode genes, which lack the poly(A) tail. Yan Guo, Shilin Zhao, Quanhu Sheng, Mingsheng Guo, Brian Lehmann, Jennifer Pietenpol, David C. Samuels, and Yu Shyr Copyright © 2015 Yan Guo et al. All rights reserved. Construction of Pancreatic Cancer Classifier Based on SVM Optimized by Improved FOA Mon, 12 Oct 2015 08:57:17 +0000 A novel method is proposed to establish the pancreatic cancer classifier. Firstly, the concept of quantum and fruit fly optimal algorithm (FOA) are introduced, respectively. Then FOA is improved by quantum coding and quantum operation, and a new smell concentration determination function is defined. Finally, the improved FOA is used to optimize the parameters of support vector machine (SVM) and the classifier is established by optimized SVM. In order to verify the effectiveness of the proposed method, SVM and other classification methods have been chosen as the comparing methods. The experimental results show that the proposed method can improve the classifier performance and cost less time. Huiyan Jiang, Di Zhao, Ruiping Zheng, and Xiaoqi Ma Copyright © 2015 Huiyan Jiang et al. All rights reserved. OperomeDB: A Database of Condition-Specific Transcription Units in Prokaryotic Genomes Mon, 12 Oct 2015 08:53:14 +0000 Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons—codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics. Kashish Chetal and Sarath Chandra Janga Copyright © 2015 Kashish Chetal and Sarath Chandra Janga. All rights reserved. Coexpression Network Analysis of miRNA-142 Overexpression in Neuronal Cells Sun, 11 Oct 2015 14:01:19 +0000 MicroRNAs are small noncoding RNA molecules, which are differentially expressed in diverse biological processes and are also involved in the regulation of multiple genes. A number of sites in the 3′ untranslated regions (UTRs) of different mRNAs allow complimentary binding for a microRNA, leading to their posttranscriptional regulation. The miRNA-142 is one of the microRNAs overexpressed in neurons that is found to regulate SIRT1 and MAOA genes. Differential analysis of gene expression data, which is focused on identifying up- or downregulated genes, ignores many relationships between genes affected by miRNA-142 overexpression in a cell. Thus, we applied a correlation network model to identify the coexpressed genes and to study the impact of miRNA-142 overexpression on this network. Combining multiple sources of knowledge is useful to infer meaningful relationships in systems biology. We applied coexpression model on the data obtained from wild type and miR-142 overexpression neuronal cells and integrated miRNA seed sequence mapping information to identify genes greatly affected by this overexpression. Larger differences in the enriched networks revealed that the nervous system development related genes such as TEAD2, PLEKHA6, and POGLUT1 were greatly impacted due to miRNA-142 overexpression. Ishwor Thapa, Howard S. Fox, and Dhundy Bastola Copyright © 2015 Ishwor Thapa et al. All rights reserved. Assessing Computational Steps for CLIP-Seq Data Analysis Sun, 11 Oct 2015 13:53:05 +0000 RNA-binding protein (RBP) is a key player in regulating gene expression at the posttranscriptional level. CLIP-Seq, with the ability to provide a genome-wide map of protein-RNA interactions, has been increasingly used to decipher RBP-mediated posttranscriptional regulation. Generating highly reliable binding sites from CLIP-Seq requires not only stringent library preparation but also considerable computational efforts. Here we presented a first systematic evaluation of major computational steps for identifying RBP binding sites from CLIP-Seq data, including preprocessing, the choice of control samples, peak normalization, and motif discovery. We found that avoiding PCR amplification artifacts, normalizing to input RNA or mRNAseq, and defining the background model from control samples can reduce the bias introduced by RNA abundance and improve the quality of detected binding sites. Our findings can serve as a general guideline for CLIP experiments design and the comprehensive analysis of CLIP-Seq data. Qi Liu, Xue Zhong, Blair B. Madison, Anil K. Rustgi, and Yu Shyr Copyright © 2015 Qi Liu et al. All rights reserved. Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations Sun, 11 Oct 2015 13:45:41 +0000 An accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine learning. Here, we explored the patterns of 1,760,846 somatic mutations identified from 230,255 cancer patients along with gene function information using support vector machine. Specifically, we performed a multiclass classification experiment over the 17 tumor sites using the gene symbol, somatic mutation, chromosome, and gene functional pathway as predictors for 6,751 subjects. The performance of the baseline using only gene features is 0.57 in accuracy. It was improved to 0.62 when adding the information of mutation and chromosome. Among the predictable primary tumor sites, the prediction of five primary sites (large intestine, liver, skin, pancreas, and lung) could achieve the performance with more than 0.70 in F-measure. The model of the large intestine ranked the first with 0.87 in F-measure. The results demonstrate that the somatic mutation information is useful for prediction of primary tumor sites with machine learning modeling. To our knowledge, this study is the first investigation of the primary sites classification using machine learning and somatic mutation data. Yukun Chen, Jingchun Sun, Liang-Chin Huang, Hua Xu, and Zhongming Zhao Copyright © 2015 Yukun Chen et al. All rights reserved. A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference Sun, 11 Oct 2015 13:43:57 +0000 High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves. Adam Cornish and Chittibabu Guda Copyright © 2015 Adam Cornish and Chittibabu Guda. All rights reserved. How to Choose In Vitro Systems to Predict In Vivo Drug Clearance: A System Pharmacology Perspective Sun, 11 Oct 2015 13:35:32 +0000 The use of in vitro metabolism data to predict human clearance has become more significant in the current prediction of large scale drug clearance for all the drugs. The relevant information (in vitro metabolism data and in vivo human clearance values) of thirty-five drugs that satisfied the entry criteria of probe drugs was collated from the literature. Then the performance of different in vitro systems including Escherichia coli system, yeast system, lymphoblastoid system and baculovirus system is compared after in vitro-in vivo extrapolation. Baculovirus system, which can provide most of the data, has almost equal accuracy as the other systems in predicting clearance. And in most cases, baculovirus system has the smaller CV in scaling factors. Therefore, the baculovirus system can be recognized as the suitable system for the large scale drug clearance prediction. Lei Wang, ChienWei Chiang, Hong Liang, Hengyi Wu, Weixing Feng, Sara K. Quinney, Jin Li, and Lang Li Copyright © 2015 Lei Wang et al. All rights reserved. A Genetic Algorithm Based Support Vector Machine Model for Blood-Brain Barrier Penetration Prediction Sun, 04 Oct 2015 11:09:01 +0000 Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. Daqing Zhang, Jianfeng Xiao, Nannan Zhou, Mingyue Zheng, Xiaomin Luo, Hualiang Jiang, and Kaixian Chen Copyright © 2015 Daqing Zhang et al. All rights reserved. How to Use SNP_TATA_Comparator to Find a Significant Change in Gene Expression Caused by the Regulatory SNP of This Gene’s Promoter via a Change in Affinity of the TATA-Binding Protein for This Promoter Sun, 04 Oct 2015 07:28:06 +0000 The use of biomedical SNP markers of diseases can improve effectiveness of treatment. Genotyping of patients with subsequent searching for SNPs more frequent than in norm is the only commonly accepted method for identification of SNP markers within the framework of translational research. The bioinformatics applications aimed at millions of unannotated SNPs of the “1000 Genomes” can make this search for SNP markers more focused and less expensive. We used our Web service involving Fisher’s -score for candidate SNP markers to find a significant change in a gene’s expression. Here we analyzed the change caused by SNPs in the gene’s promoter via a change in affinity of the TATA-binding protein for this promoter. We provide examples and discuss how to use this bioinformatics application in the course of practical analysis of unannotated SNPs from the “1000 Genomes” project. Using known biomedical SNP markers, we identified 17 novel candidate SNP markers nearby: rs549858786 (rheumatoid arthritis); rs72661131 (cardiovascular events in rheumatoid arthritis); rs562962093 (stroke); rs563558831 (cyclophosphamide bioactivation); rs55878706 (malaria resistance, leukopenia), rs572527200 (asthma, systemic sclerosis, and psoriasis), rs371045754 (hemophilia B), rs587745372 (cardiovascular events); rs372329931, rs200209906, rs367732974, and rs549591993 (all four: cancer); rs17231520 and rs569033466 (both: atherosclerosis); rs63750953, rs281864525, and rs34166473 (all three: malaria resistance, thalassemia). Mikhail Ponomarenko, Dmitry Rasskazov, Olga Arkova, Petr Ponomarenko, Valentin Suslov, Ludmila Savinkova, and Nikolay Kolchanov Copyright © 2015 Mikhail Ponomarenko et al. All rights reserved. Shaped 3D Singular Spectrum Analysis for Quantifying Gene Expression, with Application to the Early Zebrafish Embryo Thu, 01 Oct 2015 13:15:34 +0000 Recent progress in microscopy technologies, biological markers, and automated processing methods is making possible the development of gene expression atlases at cellular-level resolution over whole embryos. Raw data on gene expression is usually very noisy. This noise comes from both experimental (technical/methodological) and true biological sources (from stochastic biochemical processes). In addition, the cells or nuclei being imaged are irregularly arranged in 3D space. This makes the processing, extraction, and study of expression signals and intrinsic biological noise a serious challenge for 3D data, requiring new computational approaches. Here, we present a new approach for studying gene expression in nuclei located in a thick layer around a spherical surface. The method includes depth equalization on the sphere, flattening, interpolation to a regular grid, pattern extraction by Shaped 3D singular spectrum analysis (SSA), and interpolation back to original nuclear positions. The approach is demonstrated on several examples of gene expression in the zebrafish egg (a model system in vertebrate development). The method is tested on several different data geometries (e.g., nuclear positions) and different forms of gene expression patterns. Fully 3D datasets for developmental gene expression are becoming increasingly available; we discuss the prospects of applying 3D-SSA to data processing and analysis in this growing field. Alex Shlemov, Nina Golyandina, David Holloway, and Alexander Spirov Copyright © 2015 Alex Shlemov et al. All rights reserved. Analysis of Chemical Properties of Edible and Medicinal Ginger by Metabolomics Approach Thu, 01 Oct 2015 13:06:22 +0000 In traditional herbal medicine, comprehensive understanding of bioactive constituent is important in order to analyze its true medicinal function. We investigated the chemical properties of medicinal and edible ginger cultivars using a liquid-chromatography mass spectrometry (LC-MS) approach. Our PCA results indicate the importance of acetylated derivatives of gingerol, not gingerol or shogaol, as the medicinal indicator. A newly developed ginger cultivar, Z. officinale cv. Ogawa Umare or “Ogawa Umare” (OG), contains more active ingredients, showing properties as a new resource for the production of herbal medicines derived from ginger in terms of its chemical constituents and rhizome yield. Ken Tanaka, Masanori Arita, Hiroaki Sakurai, Naoaki Ono, and Yasuhiro Tezuka Copyright © 2015 Ken Tanaka et al. All rights reserved. EMRlog Method for Computer Security for Electronic Medical Records with Logic and Data Mining Thu, 01 Oct 2015 13:04:50 +0000 The proper functioning of a hospital computer system is an arduous work for managers and staff. However, inconsistent policies are frequent and can produce enormous problems, such as stolen information, frequent failures, and loss of the entire or part of the hospital data. This paper presents a new method named EMRlog for computer security systems in hospitals. EMRlog is focused on two kinds of security policies: directive and implemented policies. Security policies are applied to computer systems that handle huge amounts of information such as databases, applications, and medical records. Firstly, a syntactic verification step is applied by using predicate logic. Then data mining techniques are used to detect which security policies have really been implemented by the computer systems staff. Subsequently, consistency is verified in both kinds of policies; in addition these subsets are contrasted and validated. This is performed by an automatic theorem prover. Thus, many kinds of vulnerabilities can be removed for achieving a safer computer system. Sergio Mauricio Martínez Monterrubio, Juan Frausto Solis, and Raúl Monroy Borja Copyright © 2015 Sergio Mauricio Martínez Monterrubio et al. All rights reserved. Cellular Metabolic Network Analysis: Discovering Important Reactions in Treponema pallidum Thu, 01 Oct 2015 11:46:40 +0000 T. pallidum, the syphilis-causing pathogen, performs very differently in metabolism compared with other bacterial pathogens. The desire for safe and effective vaccine of syphilis requests identification of important steps in T. pallidum’s metabolism. Here, we apply Flux Balance Analysis to represent the reactions quantitatively. Thus, it is possible to cluster all reactions in T. pallidum. By calculating minimal cut sets and analyzing topological structure for the metabolic network of T. pallidum, critical reactions are identified. As a comparison, we also apply the analytical approaches to the metabolic network of H. pylori to find coregulated drug targets and unique drug targets for different microorganisms. Based on the clustering results, all reactions are further classified into various roles. Therefore, the general picture of their metabolic network is obtained and two types of reactions, both of which are involved in nucleic acid metabolism, are found to be essential for T. pallidum. It is also discovered that both hubs of reactions and the isolated reactions in purine and pyrimidine metabolisms play important roles in T. pallidum. These reactions could be potential drug targets for treating syphilis. Xueying Chen, Min Zhao, and Hong Qu Copyright © 2015 Xueying Chen et al. All rights reserved. Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data Thu, 01 Oct 2015 07:26:59 +0000 With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method’s suitability for efficient knowledge discovery from big sequence data. Akihito Kikuchi, Toshimichi Ikemura, and Takashi Abe Copyright © 2015 Akihito Kikuchi et al. All rights reserved. Discovering Distinct Functional Modules of Specific Cancer Types Using Protein-Protein Interaction Networks Thu, 01 Oct 2015 07:05:17 +0000 Background. The molecular profiles exhibited in different cancer types are very different; hence, discovering distinct functional modules associated with specific cancer types is very important to understand the distinct functions associated with them. Protein-protein interaction networks carry vital information about molecular interactions in cellular systems, and identification of functional modules (subgraphs) in these networks is one of the most important applications of biological network analysis. Results. In this study, we developed a new graph theory based method to identify distinct functional modules from nine different cancer protein-protein interaction networks. The method is composed of three major steps: (i) extracting modules from protein-protein interaction networks using network clustering algorithms; (ii) identifying distinct subgraphs from the derived modules; and (iii) identifying distinct subgraph patterns from distinct subgraphs. The subgraph patterns were evaluated using experimentally determined cancer-specific protein-protein interaction data from the Ingenuity knowledgebase, to identify distinct functional modules that are specific to each cancer type. Conclusion. We identified cancer-type specific subgraph patterns that may represent the functional modules involved in the molecular pathogenesis of different cancer types. Our method can serve as an effective tool to discover cancer-type specific functional modules from large protein-protein interaction networks. Ru Shen, Xiaosheng Wang, and Chittibabu Guda Copyright © 2015 Ru Shen et al. All rights reserved. Development and Mining of a Volatile Organic Compound Database Thu, 01 Oct 2015 06:59:32 +0000 Volatile organic compounds (VOCs) are small molecules that exhibit high vapor pressure under ambient conditions and have low boiling points. Although VOCs contribute only a small proportion of the total metabolites produced by living organisms, they play an important role in chemical ecology specifically in the biological interactions between organisms and ecosystems. VOCs are also important in the health care field as they are presently used as a biomarker to detect various human diseases. Information on VOCs is scattered in the literature until now; however, there is still no available database describing VOCs and their biological activities. To attain this purpose, we have developed KNApSAcK Metabolite Ecology Database, which contains the information on the relationships between VOCs and their emitting organisms. The KNApSAcK Metabolite Ecology is also linked with the KNApSAcK Core and KNApSAcK Metabolite Activity Database to provide further information on the metabolites and their biological activities. The VOC database can be accessed online. Azian Azamimi Abdullah, Md. Altaf-Ul-Amin, Naoaki Ono, Tetsuo Sato, Tadao Sugiura, Aki Hirai Morita, Tetsuo Katsuragi, Ai Muto, Takaaki Nishioka, and Shigehiko Kanaya Copyright © 2015 Azian Azamimi Abdullah et al. All rights reserved. Systematic Analysis of the Associations between Adverse Drug Reactions and Pathways Thu, 01 Oct 2015 06:52:17 +0000 Adverse drug reactions (ADRs) are responsible for drug candidate failure during clinical trials. It is crucial to investigate biological pathways contributing to ADRs. Here, we applied a large-scale analysis to identify overrepresented ADR-pathway combinations through merging clinical phenotypic data, biological pathway data, and drug-target relations. Evaluation was performed by scientific literature review and defining a pathway-based ADR-ADR similarity measure. The results showed that our method is efficient for finding the associations between ADRs and pathways. To more systematically understand the mechanisms of ADRs, we constructed an ADR-pathway network and an ADR-ADR network. Through network analysis on biology and pharmacology, it was found that frequent ADRs were associated with more pathways than infrequent and rare ADRs. Moreover, environmental information processing pathways contributed most to the observed ADRs. Integrating the system organ class of ADRs, we found that most classes tended to interact with other classes instead of themselves. ADR classes were distributed promiscuously in all the ADR cliques. These results reflected that drug perturbation to a certain pathway can cause changes in multiple organs, rather than in one specific organ. Our work not only provides a global view of the associations between ADRs and pathways, but also is helpful to understand the mechanisms of ADRs. Xiaowen Chen, Yanqiu Wang, Pingping Wang, Baofeng Lian, Chunquan Li, Jing Wang, Xia Li, and Wei Jiang Copyright © 2015 Xiaowen Chen et al. All rights reserved. METSP: A Maximum-Entropy Classifier Based Text Mining Tool for Transporter-Substrate Identification with Semistructured Text Thu, 01 Oct 2015 06:50:59 +0000 The substrates of a transporter are not only useful for inferring function of the transporter, but also important to discover compound-compound interaction and to reconstruct metabolic pathway. Though plenty of data has been accumulated with the developing of new technologies such as in vitro transporter assays, the search for substrates of transporters is far from complete. In this article, we introduce METSP, a maximum-entropy classifier devoted to retrieve transporter-substrate pairs (TSPs) from semistructured text. Based on the high quality annotation from UniProt, METSP achieves high precision and recall in cross-validation experiments. When METSP is applied to 182,829 human transporter annotation sentences in UniProt, it identifies 3942 sentences with transporter and compound information. Finally, 1547 confidential human TSPs are identified for further manual curation, among which 58.37% pairs with novel substrates not annotated in public transporter databases. METSP is the first efficient tool to extract TSPs from semistructured annotation text in UniProt. This tool can help to determine the precise substrates and drugs of transporters, thus facilitating drug-target prediction, metabolic network reconstruction, and literature classification. Min Zhao, Yanming Chen, Dacheng Qu, and Hong Qu Copyright © 2015 Min Zhao et al. All rights reserved. A Glimpse to Background and Characteristics of Major Molecular Biological Networks Wed, 30 Sep 2015 13:30:47 +0000 Recently, biology has become a data intensive science because of huge data sets produced by high throughput molecular biological experiments in diverse areas including the fields of genomics, transcriptomics, proteomics, and metabolomics. These huge datasets have paved the way for system-level analysis of the processes and subprocesses of the cell. For system-level understanding, initially the elements of a system are connected based on their mutual relations and a network is formed. Among omics researchers, construction and analysis of biological networks have become highly popular. In this review, we briefly discuss both the biological background and topological properties of major types of omics networks to facilitate a comprehensive understanding and to conceptualize the foundation of network biology. Md. Altaf-Ul-Amin, Tetsuo Katsuragi, Tetsuo Sato, and Shigehiko Kanaya Copyright © 2015 Md. Altaf-Ul-Amin et al. All rights reserved. Bioinformatics Methods and Biological Interpretation for Next-Generation Sequencing Data Mon, 07 Sep 2015 06:56:22 +0000 Guohua Wang, Yunlong Liu, Dongxiao Zhu, Gunnar W. Klau, and Weixing Feng Copyright © 2015 Guohua Wang et al. All rights reserved. MicroRNA Promoter Identification in Arabidopsis Using Multiple Histone Markers Thu, 03 Sep 2015 13:14:36 +0000 A microRNA is a small noncoding RNA molecule, which functions in RNA silencing and posttranscriptional regulation of gene expression. To understand the mechanism of the activation of microRNA genes, the location of promoter regions driving their expression is required to be annotated precisely. Only a fraction of microRNA genes have confirmed transcription start sites (TSSs), which hinders our understanding of the transcription factor binding events. With the development of the next generation sequencing technology, the chromatin states can be inferred precisely by virtue of a combination of specific histone modifications. Using the genome-wide profiles of nine histone markers including H3K4me2, H3K4me3, H3K9Ac, H3K9me2, H3K18Ac, H3K27me1, H3K27me3, H3K36me2, and H3K36me3, we developed a computational strategy to identify the promoter regions of most microRNA genes in Arabidopsis, based upon the assumption that the distribution of histone markers around the TSSs of microRNA genes is similar to the TSSs of protein coding genes. Among 298 miRNA genes, our model identified 42 independent miRNA TSSs and 132 miRNA TSSs, which are located in the promoters of upstream genes. The identification of promoters will provide better understanding of microRNA regulation and can play an important role in the study of diseases at genetic level. Yuming Zhao, Fang Wang, and Liran Juan Copyright © 2015 Yuming Zhao et al. All rights reserved. Constructing a Genome-Wide LD Map of Wild A. gambiae Using Next-Generation Sequencing Thu, 03 Sep 2015 13:11:49 +0000 Anopheles gambiae is the major malaria vector in Africa. Examining the molecular basis of A. gambiae traits requires knowledge of both genetic variation and genome-wide linkage disequilibrium (LD) map of wild A. gambiae populations from malaria-endemic areas. We sequenced the genomes of nine wild A. gambiae mosquitoes individually using next-generation sequencing technologies and detected 2,219,815 common single nucleotide polymorphisms (SNPs), 88% of which are novel. SNPs are not evenly distributed across A. gambiae chromosomes. The low SNP-frequency regions overlay heterochromatin and chromosome inversion domains, consistent with the lower recombinant rates at these regions. Nearly one million SNPs that were genotyped correctly in all individual mosquitoes with 99.6% confidence were extracted from these high-throughput sequencing data. Based on these SNP genotypes, we constructed a genome-wide LD map for wild A. gambiae from malaria-endemic areas in Kenya and made it available through a public Website. The average size of LD blocks is less than 40 bp, and several large LD blocks were also discovered clustered around the para gene, which is consistent with the effect of insecticide selective sweeps. The SNPs and the LD map will be valuable resources for scientific communities to dissect the A. gambiae genome. Xiaohong Wang, Yaw A. Afrane, Guiyun Yan, and Jun Li Copyright © 2015 Xiaohong Wang et al. All rights reserved. Survey of Programs Used to Detect Alternative Splicing Isoforms from Deep Sequencing Data In Silico Thu, 03 Sep 2015 11:55:27 +0000 Next-generation sequencing techniques have been rapidly emerging. However, the massive sequencing reads hide a great deal of unknown important information. Advances have enabled researchers to discover alternative splicing (AS) sites and isoforms using computational approaches instead of molecular experiments. Given the importance of AS for gene expression and protein diversity in eukaryotes, detecting alternative splicing and isoforms represents a hot topic in systems biology and epigenetics research. The computational methods applied to AS prediction have improved since the emergence of next-generation sequencing. In this study, we introduce state-of-the-art research on AS and then compare the research methods and software tools available for AS based on next-generation sequencing reads. Finally, we discuss the prospects of computational methods related to AS. Feng Min, Sumei Wang, and Li Zhang Copyright © 2015 Feng Min et al. All rights reserved. Understanding Transcription Factor Regulation by Integrating Gene Expression and DNase I Hypersensitive Sites Thu, 03 Sep 2015 09:24:16 +0000 Transcription factors are proteins that bind to DNA sequences to regulate gene transcription. The transcription factor binding sites are short DNA sequences (5–20 bp long) specifically bound by one or more transcription factors. The identification of transcription factor binding sites and prediction of their function continue to be challenging problems in computational biology. In this study, by integrating the DNase I hypersensitive sites with known position weight matrices in the TRANSFAC database, the transcription factor binding sites in gene regulatory region are identified. Based on the global gene expression patterns in cervical cancer HeLaS3 cell and HelaS3-ifnα4h cell (interferon treatment on HeLaS3 cell for 4 hours), we present a model-based computational approach to predict a set of transcription factors that potentially cause such differential gene expression. Significantly, 6 out 10 predicted functional factors, including IRF, IRF-2, IRF-9, IRF-1 and IRF-3, ICSBP, belong to interferon regulatory factor family and upregulate the gene expression levels responding to the interferon treatment. Another factor, ISGF-3, is also a transcriptional activator induced by interferon alpha. Using the different transcription factor binding sites selected criteria, the prediction result of our model is consistent. Our model demonstrated the potential to computationally identify the functional transcription factors in gene regulation. Guohua Wang, Fang Wang, Qian Huang, Yu Li, Yunlong Liu, and Yadong Wang Copyright © 2015 Guohua Wang et al. All rights reserved. Active Microbial Communities Inhabit Sulphate-Methane Interphase in Deep Bedrock Fracture Fluids in Olkiluoto, Finland Thu, 03 Sep 2015 09:23:57 +0000 Active microbial communities of deep crystalline bedrock fracture water were investigated from seven different boreholes in Olkiluoto (Western Finland) using bacterial and archaeal 16S rRNA, dsrB, and mcrA gene transcript targeted 454 pyrosequencing. Over a depth range of 296–798 m below ground surface the microbial communities changed according to depth, salinity gradient, and sulphate and methane concentrations. The highest bacterial diversity was observed in the sulphate-methane mixing zone (SMMZ) at 250–350 m depth, whereas archaeal diversity was highest in the lowest boundaries of the SMMZ. Sulphide-oxidizing ε-proteobacteria (Sulfurimonas sp.) dominated in the SMMZ and γ-proteobacteria (Pseudomonas spp.) below the SMMZ. The active archaeal communities consisted mostly of ANME-2D and Thermoplasmatales groups, although Methermicoccaceae, Methanobacteriaceae, and Thermoplasmatales (SAGMEG, TMG) were more common at 415–559 m depth. Typical indicator microorganisms for sulphate-methane transition zones in marine sediments, such as ANME-1 archaea, α-, β- and δ-proteobacteria, JS1, Actinomycetes, Planctomycetes, Chloroflexi, and MBGB Crenarchaeota were detected at specific depths. DsrB genes were most numerous and most actively transcribed in the SMMZ while the mcrA gene concentration was highest in the deep methane rich groundwater. Our results demonstrate that active and highly diverse but sparse and stratified microbial communities inhabit the Fennoscandian deep bedrock ecosystems. Malin Bomberg, Mari Nyyssönen, Petteri Pitkänen, Anne Lehtinen, and Merja Itävaara Copyright © 2015 Malin Bomberg et al. All rights reserved. 454-Pyrosequencing Analysis of Bacterial Communities from Autotrophic Nitrogen Removal Bioreactors Utilizing Universal Primers: Effect of Annealing Temperature Thu, 03 Sep 2015 09:22:02 +0000 Identification of anaerobic ammonium oxidizing (anammox) bacteria by molecular tools aimed at the evaluation of bacterial diversity in autotrophic nitrogen removal systems is limited by the difficulty to design universal primers for the Bacteria domain able to amplify the anammox 16S rRNA genes. A metagenomic analysis (pyrosequencing) of total bacterial diversity including anammox population in five autotrophic nitrogen removal technologies, two bench-scale models (MBR and Low Temperature CANON) and three full-scale bioreactors (anammox, CANON, and DEMON), was successfully carried out by optimization of primer selection and PCR conditions (annealing temperature). The universal primer 530F was identified as the best candidate for total bacteria and anammox bacteria diversity coverage. Salt-adjusted optimum annealing temperature of primer 530F was calculated (47°C) and hence a range of annealing temperatures of 44–49°C was tested. Pyrosequencing data showed that annealing temperature of 45°C yielded the best results in terms of species richness and diversity for all bioreactors analyzed. Alejandro Gonzalez-Martinez, Alejandro Rodriguez-Sanchez, Belén Rodelas, Ben A. Abbas, Maria Victoria Martinez-Toledo, Mark C. M. van Loosdrecht, F. Osorio, and Jesus Gonzalez-Lopez Copyright © 2015 Alejandro Gonzalez-Martinez et al. All rights reserved. mmnet: An R Package for Metagenomics Systems Biology Analysis Thu, 03 Sep 2015 09:22:02 +0000 The human microbiome plays important roles in human health and disease. Previous microbiome studies focused mainly on single pure species function and overlooked the interactions in the complex communities on system-level. A metagenomic approach introduced recently integrates metagenomic data with community-level metabolic network modeling, but no comprehensive tool was available for such kind of approaches. To facilitate these kinds of studies, we developed an R package, mmnet, to implement community-level metabolic network reconstruction. The package also implements a set of functions for automatic analysis pipeline construction including functional annotation of metagenomic reads, abundance estimation of enzymatic genes, community-level metabolic network reconstruction, and integrated network analysis. The result can be represented in an intuitive way and sent to Cytoscape for further exploration. The package has substantial potentials in metagenomic studies that focus on identifying system-level variations of human microbiome associated with disease. Yang Cao, Xiaofei Zheng, Fei Li, and Xiaochen Bo Copyright © 2015 Yang Cao et al. All rights reserved. Genetic Interactions Explain Variance in Cingulate Amyloid Burden: An AV-45 PET Genome-Wide Association and Interaction Study in the ADNI Cohort Thu, 03 Sep 2015 09:20:58 +0000 Alzheimer’s disease (AD) is the most common neurodegenerative disorder. Using discrete disease status as the phenotype and computing statistics at the single marker level may not be able to address the underlying biological interactions that contribute to disease mechanism and may contribute to the issue of “missing heritability.” We performed a genome-wide association study (GWAS) and a genome-wide interaction study (GWIS) of an amyloid imaging phenotype, using the data from Alzheimer’s Disease Neuroimaging Initiative. We investigated the genetic main effects and interaction effects on cingulate amyloid-beta (A) load in an effort to better understand the genetic etiology of A deposition that is a widely studied AD biomarker. PLINK was used in the single marker GWAS, and INTERSNP was used to perform the two-marker GWIS, focusing only on SNPs with for the GWAS analysis. Age, sex, and diagnosis were used as covariates in both analyses. Corrected p values using the Bonferroni method were reported. The GWAS analysis revealed significant hits within or proximal to APOE, APOC1, and TOMM40 genes, which were previously implicated in AD. The GWIS analysis yielded 8 novel SNP-SNP interaction findings that warrant replication and further investigation. Jin Li, Qiushi Zhang, Feng Chen, Jingwen Yan, Sungeun Kim, Lei Wang, Weixing Feng, Andrew J. Saykin, Hong Liang, and Li Shen Copyright © 2015 Jin Li et al. All rights reserved. How to Isolate a Plant’s Hypomethylome in One Shot Thu, 03 Sep 2015 09:14:51 +0000 Genome assembly remains a challenge for large and/or complex plant genomes due to their abundant repetitive regions resulting in studies focusing on gene space instead of the whole genome. Thus, DNA enrichment strategies facilitate the assembly by increasing the coverage and simultaneously reducing the complexity of the whole genome. In this paper we provide an easy, fast, and cost-effective variant of MRE-seq to obtain a plant’s hypomethylome by an optimized methyl filtration protocol followed by next generation sequencing. The method is demonstrated on three plant species with knowingly large and/or complex (polyploid) genomes: Oryza sativa, Picea abies, and Crocus sativus. The identified hypomethylomes show clear enrichment for genes and their flanking regions and clear reduction of transposable elements. Additionally, genomic sequences around genes are captured including regulatory elements in introns and up- and downstream flanks. High similarity of the results obtained by a de novo assembly approach with a reference based mapping in rice supports the applicability for studying and understanding the genomes of nonmodel organisms. Hence we show the high potential of MRE-seq in a wide range of scenarios for the direct analysis of methylation differences, for example, between ecotypes, individuals, within or across species harbouring large, and complex genomes. Elisabeth Wischnitzki, Eva Maria Sehr, Karin Hansel-Hohl, Maria Berenyi, Kornel Burg, and Silvia Fluch Copyright © 2015 Elisabeth Wischnitzki et al. All rights reserved. Data Acquisition and Processing in Biology and Medicine Wed, 26 Aug 2015 10:13:46 +0000 Cheng-Hong Yang, Yu-Jie Huang, An Liu, Yi Rong, and Tsair-Fwu Lee Copyright © 2015 Cheng-Hong Yang et al. All rights reserved. The Combinational Polymorphisms of ORAI1 Gene Are Associated with Preventive Models of Breast Cancer in the Taiwanese Tue, 25 Aug 2015 14:02:28 +0000 The ORAI calcium release-activated calcium modulator 1 (ORAI1) has been proven to be an important gene for breast cancer progression and metastasis. However, the protective association model between the single nucleotide polymorphisms (SNPs) of ORAI1 gene was not investigated. Based on a published data set of 345 female breast cancer patients and 290 female controls, we used a particle swarm optimization (PSO) algorithm to identify the possible protective models of breast cancer association in terms of the SNPs of ORAI1 gene. Results showed that the PSO-generated models of 2-SNP (rs12320939-TT/rs12313273-CC), 3-SNP (rs12320939-TT/rs12313273-CC/rs712853-(TT/TC)), 4-SNP (rs12320939-TT/rs12313273-CC/rs7135617-(GG/GT)/rs712853-(TT/TC)), and 5-SNP (rs12320939-TT/rs12313273-CC/rs7135617-(GG/GT)/rs6486795-CC/rs712853-(TT/TC)) displayed low values of odds ratios (0.409–0.425) for breast cancer association. Taken together, these results suggested that our proposed PSO strategy is powerful to identify the combinational SNPs of rs12320939, rs12313273, rs7135617, rs6486795, and rs712853 of ORAI1 gene with a strongly protective association in breast cancer. Fu Ou-Yang, Yu-Da Lin, Li-Yeh Chuang, Hsueh-Wei Chang, Cheng-Hong Yang, and Ming-Feng Hou Copyright © 2015 Fu Ou-Yang et al. All rights reserved. Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information Tue, 25 Aug 2015 08:22:17 +0000 Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition. Chi Zhang, Li Tong, Ying Zeng, Jingfang Jiang, Haibing Bu, Bin Yan, and Jianxin Li Copyright © 2015 Chi Zhang et al. All rights reserved. Tennis Elbow Diagnosis Using Equivalent Uniform Voltage to Fit the Logistic and the Probit Diseased Probability Models Tue, 25 Aug 2015 07:46:17 +0000 To develop the logistic and the probit models to analyse electromyographic (EMG) equivalent uniform voltage- (EUV-) response for the tenderness of tennis elbow. In total, 78 hands from 39 subjects were enrolled. In this study, surface EMG (sEMG) signal is obtained by an innovative device with electrodes over forearm region. The analytical endpoint was defined as Visual Analog Score (VAS) 3+ tenderness of tennis elbow. The logistic and the probit diseased probability (DP) models were established for the VAS score and EMG absolute voltage-time histograms (AVTH). TV50 is the threshold equivalent uniform voltage predicting a 50% risk of disease. Twenty-one out of 78 samples (27%) developed VAS 3+ tenderness of tennis elbow reported by the subject and confirmed by the physician. The fitted DP parameters were TV50 = 153.0 mV (CI: 136.3–169.7 mV), γ50 = 0.84 (CI: 0.78–0.90) and TV50 = 155.6 mV (CI: 138.9–172.4 mV), m = 0.54 (CI: 0.49–0.59) for logistic and probit models, respectively. When the EUV ≥ 153 mV, the DP of the patient is greater than 50% and vice versa. The logistic and the probit models are valuable tools to predict the DP of VAS 3+ tenderness of tennis elbow. Tsair-Fwu Lee, Wei-Chun Lin, Hung-Yu Wang, Shu-Yuan Lin, Li-Fu Wu, Shih-Sian Guo, Hsiang-Jui Huang, Hui-Min Ting, and Pei-Ju Chao Copyright © 2015 Tsair-Fwu Lee et al. All rights reserved. A Data Hiding Technique to Synchronously Embed Physiological Signals in H.264/AVC Encoded Video for Medicine Healthcare Tue, 25 Aug 2015 07:45:41 +0000 The recognition of clinical manifestations in both video images and physiological-signal waveforms is an important aid to improve the safety and effectiveness in medical care. Physicians can rely on video-waveform (VW) observations to recognize difficult-to-spot signs and symptoms. The VW observations can also reduce the number of false positive incidents and expand the recognition coverage to abnormal health conditions. The synchronization between the video images and the physiological-signal waveforms is fundamental for the successful recognition of the clinical manifestations. The use of conventional equipment to synchronously acquire and display the video-waveform information involves complex tasks such as the video capture/compression, the acquisition/compression of each physiological signal, and the video-waveform synchronization based on timestamps. This paper introduces a data hiding technique capable of both enabling embedding channels and synchronously hiding samples of physiological signals into encoded video sequences. Our data hiding technique offers large data capacity and simplifies the complexity of the video-waveform acquisition and reproduction. The experimental results revealed successful embedding and full restoration of signal’s samples. Our results also demonstrated a small distortion in the video objective quality, a small increment in bit-rate, and embedded cost savings of −2.6196% for high and medium motion video sequences. Raul Peña, Alfonso Ávila, David Muñoz, and Juan Lavariega Copyright © 2015 Raul Peña et al. All rights reserved. Information-Theoretical Quantifier of Brain Rhythm Based on Data-Driven Multiscale Representation Mon, 24 Aug 2015 14:18:37 +0000 This paper presents a data-driven multiscale entropy measure to reveal the scale dependent information quantity of electroencephalogram (EEG) recordings. This work is motivated by the previous observations on the nonlinear and nonstationary nature of EEG over multiple time scales. Here, a new framework of entropy measures considering changing dynamics over multiple oscillatory scales is presented. First, to deal with nonstationarity over multiple scales, EEG recording is decomposed by applying the empirical mode decomposition (EMD) which is known to be effective for extracting the constituent narrowband components without a predetermined basis. Following calculation of Renyi entropy of the probability distributions of the intrinsic mode functions extracted by EMD leads to a data-driven multiscale Renyi entropy. To validate the performance of the proposed entropy measure, actual EEG recordings from rats experiencing 7 min cardiac arrest followed by resuscitation were analyzed. Simulation and experimental results demonstrate that the use of the multiscale Renyi entropy leads to better discriminative capability of the injury levels and improved correlations with the neurological deficit evaluation after 72 hours after cardiac arrest, thus suggesting an effective diagnostic and prognostic tool. Young-Seok Choi Copyright © 2015 Young-Seok Choi. All rights reserved. Gene Network Analysis of Glucose Linked Signaling Pathways and Their Role in Human Hepatocellular Carcinoma Cell Growth and Survival in HuH7 and HepG2 Cell Lines Mon, 24 Aug 2015 11:19:10 +0000 Cancer progression may be affected by metabolism. In this study, we aimed to analyze the effect of glucose on the proliferation and/or survival of human hepatocellular carcinoma (HCC) cells. Human gene datasets regulated by glucose were compared to gene datasets either dysregulated in HCC or regulated by other signaling pathways. Significant numbers of common genes suggested putative involvement in transcriptional regulations by glucose. Real-time proliferation assays using high (4.5 g/L) versus low (1 g/L) glucose on two human HCC cell lines and specific inhibitors of selected pathways were used for experimental validations. High glucose promoted HuH7 cell proliferation but not that of HepG2 cell line. Gene network analyses suggest that gene transcription by glucose could be mediated at 92% through ChREBP in HepG2 cells, compared to 40% in either other human cells or rodent healthy liver, with alteration of LKB1 (serine/threonine kinase 11) and NOX (NADPH oxidases) signaling pathways and loss of transcriptional regulation of PPARGC1A (peroxisome-proliferator activated receptors gamma coactivator 1) target genes by high glucose. Both PPARA and PPARGC1A regulate transcription of genes commonly regulated by glycolysis, by the antidiabetic agent metformin and by NOX, suggesting their major interplay in the control of HCC progression. Emmanuelle Berger, Nathalie Vega, Michèle Weiss-Gayet, and Alain Géloën Copyright © 2015 Emmanuelle Berger et al. All rights reserved. Applying NGS Data to Find Evolutionary Network Biomarkers from the Early and Late Stages of Hepatocellular Carcinoma Thu, 20 Aug 2015 07:07:01 +0000 Hepatocellular carcinoma (HCC) is a major liver tumor (~80%), besides hepatoblastomas, angiosarcomas, and cholangiocarcinomas. In this study, we used a systems biology approach to construct protein-protein interaction networks (PPINs) for early-stage and late-stage liver cancer. By comparing the networks of these two stages, we found that the two networks showed some common mechanisms and some significantly different mechanisms. To obtain differential network structures between cancer and noncancer PPINs, we constructed cancer PPIN and noncancer PPIN network structures for the two stages of liver cancer by systems biology method using NGS data from cancer cells and adjacent noncancer cells. Using carcinogenesis relevance values (CRVs), we identified 43 and 80 significant proteins and their PPINs (network markers) for early-stage and late-stage liver cancer. To investigate the evolution of network biomarkers in the carcinogenesis process, a primary pathway analysis showed that common pathways of the early and late stages were those related to ordinary cancer mechanisms. A pathway specific to the early stage was the mismatch repair pathway, while pathways specific to the late stage were the spliceosome pathway, lysine degradation pathway, and progesterone-mediated oocyte maturation pathway. This study provides a new direction for cancer-targeted therapies at different stages. Yung-Hao Wong, Chia-Chou Wu, Chih-Lung Lin, Ting-Shou Chen, Tzu-Hao Chang, and Bor-Sen Chen Copyright © 2015 Yung-Hao Wong et al. All rights reserved. The ABCC6 Transporter as a Paradigm for Networking from an Orphan Disease to Complex Disorders Tue, 18 Aug 2015 09:35:55 +0000 The knowledge on the genetic etiology of complex disorders largely results from the study of rare monogenic disorders. Often these common and rare diseases show phenotypic overlap, though monogenic diseases generally have a more extreme symptomatology. ABCC6, the gene responsible for pseudoxanthoma elasticum, an autosomal recessive ectopic mineralization disorder, can be considered a paradigm gene with relevance that reaches far beyond this enigmatic orphan disease. Indeed, common traits such as chronic kidney disease or cardiovascular disorders have been linked to the ABCC6 gene. While during the last decade the awareness of the wide ramifications of ABCC6 has increased significantly, the gene itself and the transmembrane transporter it encodes have not unveiled all of the mysteries that surround them. To gain more insights, multiple approaches are being used including next-generation sequencing, computational methods, and various “omics” technologies. Much effort is made to place the vast amount of data that is gathered in an integrated system-biological network; the involvement of ABCC6 in common disorders provides a good view on the wide implications and potential of such a network. In this review, we summarize the network approaches used to study ABCC6 and the role of this gene in several complex diseases. Eva Y. G. De Vilder, Mohammad Jakir Hosen, and Olivier M. Vanakker Copyright © 2015 Eva Y. G. De Vilder et al. All rights reserved. An Affinity Propagation-Based DNA Motif Discovery Algorithm Mon, 10 Aug 2015 09:57:56 +0000 The planted motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy. Chunxiao Sun, Hongwei Huo, Qiang Yu, Haitao Guo, and Zhigang Sun Copyright © 2015 Chunxiao Sun et al. All rights reserved. Statistical Analysis of High-Dimensional Genetic Data in Complex Traits Tue, 04 Aug 2015 14:52:57 +0000 Taesung Park, Kristel Van Steen, Xiang-Yang Lou, and Momiao Xiong Copyright © 2015 Taesung Park et al. All rights reserved. Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data Tue, 04 Aug 2015 11:27:26 +0000 Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called “large P and small N” problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration. Sungho Won, Hosik Choi, Suyeon Park, Juyoung Lee, Changyi Park, and Sunghoon Kwon Copyright © 2015 Sungho Won et al. All rights reserved. Detection of Epistatic and Gene-Environment Interactions Underlying Three Quality Traits in Rice Using High-Throughput Genome-Wide Data Tue, 04 Aug 2015 11:23:29 +0000 With development of sequencing technology, dense single nucleotide polymorphisms (SNPs) have been available, enabling uncovering genetic architecture of complex traits by genome-wide association study (GWAS). However, the current GWAS strategy usually ignores epistatic and gene-environment interactions due to absence of appropriate methodology and heavy computational burden. This study proposed a new GWAS strategy by combining the graphics processing unit- (GPU-) based generalized multifactor dimensionality reduction (GMDR) algorithm with mixed linear model approach. The reliability and efficiency of the analytical methods were verified through Monte Carlo simulations, suggesting that a population size of nearly 150 recombinant inbred lines (RILs) had a reasonable resolution for the scenarios considered. Further, a GWAS was conducted with the above two-step strategy to investigate the additive, epistatic, and gene-environment associations between 701,867 SNPs and three important quality traits, gelatinization temperature, amylose content, and gel consistency, in a RIL population with 138 individuals derived from super-hybrid rice Xieyou9308 in two environments. Four significant SNPs were identified with additive, epistatic, and gene-environment interaction effects. Our study showed that the mixed linear model approach combining with the GPU-based GMDR algorithm is a feasible strategy for implementing GWAS to uncover genetic architecture of crop complex traits. Haiming Xu, Beibei Jiang, Yujie Cao, Yingxin Zhang, Xiaodeng Zhan, Xihong Shen, Shihua Cheng, Xiangyang Lou, and Liyong Cao Copyright © 2015 Haiming Xu et al. All rights reserved. Systems Biology Approaches to Mining High Throughput Biological Data Tue, 04 Aug 2015 11:22:20 +0000 Fang-Xiang Wu, Min Li, Jishou Ruan, and Feng Luo Copyright © 2015 Fang-Xiang Wu et al. All rights reserved. Dynamic Model for RNA-seq Data Analysis Tue, 04 Aug 2015 11:20:34 +0000 By measuring messenger RNA levels for all genes in a sample, RNA-seq provides an attractive option to characterize the global changes in transcription. RNA-seq is becoming the widely used platform for gene expression profiling. However, real transcription signals in the RNA-seq data are confounded with measurement and sequencing errors and other random biological/technical variation. To extract biologically useful transcription process from the RNA-seq data, we propose to use the second ODE for modeling the RNA-seq data. We use differential principal analysis to develop statistical methods for estimation of location-varying coefficients of the ODE. All rights reserved. Clique-Based Clustering of Correlated SNPs in a Gene Can Improve Performance of Gene-Based Multi-Bin Linear Combination Test Tue, 04 Aug 2015 10:59:47 +0000 Gene-based analysis of multiple single nucleotide polymorphisms (SNPs) in a gene region is an alternative to single SNP analysis. The multi-bin linear combination test (MLC) proposed in previous studies utilizes the correlation among SNPs within a gene to construct a gene-based global test. SNPs are partitioned into clusters of highly correlated SNPs, and the MLC test statistic quadratically combines linear combination statistics constructed for each cluster. The test has degrees of freedom equal to the number of clusters and can be more powerful than a fully quadratic or fully linear test statistic. In this study, we develop a new SNP clustering algorithm designed to find cliques, which are complete subnetworks of SNPs with all pairwise correlations above a threshold. We evaluate the performance of the MLC test using the clique-based CLQ algorithm versus using the tag-SNP-based LDSelect algorithm. In our numerical power calculations we observed that the two clustering algorithms produce identical clusters about 40~60% of the time, yielding similar power on average. However, because the CLQ algorithm tends to produce smaller clusters with stronger positive correlation, the MLC test is less likely to be affected by the occurrence of opposing signs in the individual SNP effect coefficients. Yun Joo Yoo, Sun Ah Kim, and Shelley B. Bull Copyright © 2015 Yun Joo Yoo et al. All rights reserved. Identifying and Assessing Interesting Subgroups in a Heterogeneous Population Mon, 03 Aug 2015 13:21:57 +0000 Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided. Woojoo Lee, Andrey Alexeyenko, Maria Pernemalm, Justine Guegan, Philippe Dessen, Vladimir Lazar, Janne Lehtiö, and Yudi Pawitan Copyright © 2015 Woojoo Lee et al. All rights reserved. Detecting Genetic Interactions for Quantitative Traits Using -Spacing Entropy Measure Mon, 03 Aug 2015 13:10:36 +0000 A number of statistical methods for detecting gene-gene interactions have been developed in genetic association studies with binary traits. However, many phenotype measures are intrinsically quantitative and categorizing continuous traits may not always be straightforward and meaningful. ProSim: A Method for Prioritizing Disease Genes Based on Protein Proximity and Disease Similarity Mon, 03 Aug 2015 10:49:13 +0000 Predicting disease genes for a particular genetic disease is very challenging in bioinformatics. Based on current research studies, this challenge can be tackled via network-based approaches. Furthermore, it has been highlighted that it is necessary to consider disease similarity along with the protein’s proximity to disease genes in a protein-protein interaction (PPI) network in order to improve the accuracy of disease gene prioritization. In this study we propose a new algorithm called proximity disease similarity algorithm (ProSim), which takes both of the aforementioned properties into consideration, to prioritize disease genes. To illustrate the proposed algorithm, we have conducted six case studies, namely, prostate cancer, Alzheimer’s disease, diabetes mellitus type 2, breast cancer, colorectal cancer, and lung cancer. Module Based Differential Coexpression Analysis Method for Type 2 Diabetes Mon, 03 Aug 2015 07:40:09 +0000 More and more studies have shown that many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional biological pathway or network and are highly correlated. Differential coexpression analysis, as a more comprehensive technique to the differential expression analysis, was raised to research gene regulatory networks and biological pathways of phenotypic changes through measuring gene correlation changes between disease and normal conditions. In this paper, we propose a gene differential coexpression analysis algorithm in the level of gene sets and apply the algorithm to a publicly available type 2 diabetes (T2D) expression dataset. Firstly, we calculate coexpression biweight midcorrelation coefficients between all gene pairs. Then, we select informative correlation pairs using the “differential coexpression threshold” strategy. Finally, we identify the differential coexpression gene modules using maximum clique concept and k-clique algorithm. We apply the proposed differential coexpression analysis method on simulated data and T2D data. Two differential coexpression gene modules about T2D were detected, which should be useful for exploring the biological function of the related genes. Lin Yuan, Chun-Hou Zheng, Jun-Feng Xia, and De-Shuang Huang Copyright © 2015 Lin Yuan et al. All rights reserved. Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection Mon, 03 Aug 2015 07:26:06 +0000 Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. However, as an affinity-based method, MeRIP-Seq has yet provided base-pair resolution; that is, a single methylation site determined from MeRIP-Seq data can in practice contain multiple RNA methylation residuals, some of which can be regulated by different enzymes and thus differentially methylated between two conditions. Since existing peak-based methods could not effectively differentiate multiple methylation residuals located within a single methylation site, we propose a hidden Markov model (HMM) based approach to address this issue. Specifically, the detected RNA methylation site is further divided into multiple adjacent small bins and then scanned with higher resolution using a hidden Markov model to model the dependency between spatially adjacent bins for improved accuracy. We tested the proposed algorithm on both simulated data and real data. In this study, we have developed a network-based approach to identify molecular biomarkers that can distinguish SCLC from NSCLC. By identifying positive and negative coexpression gene pairs in normal lung tissues, SCLC, or NSCLC samples and using functional association information from the STRING network, we first construct a lung cancer-specific gene association network. From the network, we obtain gene modules in which genes are highly functionally associated with each other and are either positively or negatively coexpressed in the three conditions. Then, we identify gene modules that not only are differentially expressed between cancer and normal samples, but also show distinctive expression patterns between SCLC and NSCLC. Finally, we select genes inside those modules with discriminating coexpression patterns between the two lung cancer subtypes and predict them as candidate biomarkers that are of diagnostic use. The selective pressure decreases from the zygote stage to the 4-cell stage and increases at the 8-cell stage and then decreases again from 8-cell stage to the late blastocyst stages. Previous EVO-DEVO studies concerning the whole embryo development neglected the fluctuation of selective pressure in these earlier stages, and the fluctuation was potentially correlated with events of earlier stages, such as zygote genome activation (ZGA). Such oscillation in an earlier stage would further affect models of the evolutionary constraints on whole embryo development. Therefore, these earlier stages should be measured intensively in future EVO-DEVO studies. Tiancheng Liu, Lin Yu, Guohui Ding, Zhen Wang, Lei Liu, Hong Li, and Yixue Li Copyright © 2015 Tiancheng Liu et al. All rights reserved. Statistical Genomic Approach Identifies Association between FSHR Polymorphisms and Polycystic Ovary Morphology in Women with Polycystic Ovary Syndrome Sun, 26 Jul 2015 12:40:17 +0000 Background. Single-nucleotide polymorphisms (SNPs) in the follicle stimulating hormone receptor (FSHR) gene are associated with PCOS. However, their relationship to the polycystic ovary (PCO) morphology remains unknown. This study aimed to investigate whether PCOS related SNPs in the FSHR gene are associated with PCO in women with PCOS. Methods. Patients were grouped into PCO () and non-PCO () groups. Genomic genotypes were profiled using Affymetrix human genome SNP chip 6. Two polymorphisms (rs2268361 and rs2349415) of FSHR were analyzed using a statistical approach. Results. Significant differences were found in the allele distributions of the GG genotype of rs2268361 between the PCO and non-PCO groups (27.6% GG, 53.4% GA, and 19.0% AA versus 33.3% GG, 36.5% GA, and 30.2% AA), while no significant differences were found in the allele distributions of the GG genotype of rs2349415. Although different breast cancer cell lines were widely used in laboratory investigations, accumulated evidences have indicated that genomic differences exist between cancer cell lines and tissue samples in the past decades. The abundant molecular profiles of cancer cell lines and tumor samples deposited in the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas now allow a systematical comparison of the breast cancer cell lines with breast tumors. We depicted the genomic characteristics of breast primary tumors based on the copy number variation and gene expression profiles and the breast cancer cell lines were compared to different subgroups of breast tumors. We identified that some of the breast cancer cell lines show high correlation with the tumor group that agrees with previous knowledge, while a big part of them do not, including the most used MCF7, MDA-MB-231, and T-47D. In this review paper, the use of ridge regression for prediction in quantitative genetics using single-nucleotide polymorphism data is discussed. In particular, we consider (i) the theoretical foundations of ridge regression, (ii) its link to commonly used methods in animal breeding, (iii) the computational feasibility, and (iv) the scope for constructing prediction models with nonlinear effects (e.g., dominance and epistasis). Based on a simulation study we gauge the current and future potential of ridge regression for prediction of human traits using genome-wide SNP data. We conclude that, for outcomes with a relatively simple genetic architecture, given current sample sizes in most cohorts (i.e., ,000) the predictive accuracy of ridge regression is slightly higher than the classical genome-wide association study approach of repeated simple regression (i.e., one regression per SNP). However, both capture only a small proportion of the heritability. Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods Sun, 26 Jul 2015 07:38:47 +0000 MicroRNAs constitute an important class of noncoding, single-stranded, ~22 nucleotide long RNA molecules encoded by endogenous genes. They play an important role in regulating gene transcription and the regulation of normal development. MicroRNAs can be associated with disease; however, only a few microRNA-disease associations have been confirmed by traditional experimental approaches. We introduce two methods to predict microRNA-disease association. The first method, KATZ, focuses on integrating the social network analysis method with machine learning and is based on networks derived from known microRNA-disease associations, disease-disease associations, and microRNA-microRNA associations. The other method, CATAPULT, is a supervised machine learning method. In such situations, being able to predict the fate of agents in foods would help risk assessors and decision makers in assessing the potential effects of a specific contamination event and thus enable them to deduce the appropriate mitigation measures. One efficient strategy supporting this is using model based simulations. However, application in crisis situations requires ready-to-use and easy-to-adapt models to be available from the so-called food safety knowledge bases. Here, we illustrate this concept and its benefits by applying the modular open source software tools PMM-Lab and FoodProcess-Lab. As a fictitious sample scenario, an intentional ricin contamination at a beef salami production facility was modelled. Predictive models describing the inactivation of ricin were reviewed, relevant models were implemented with PMM-Lab, and simulations on residual toxin amounts in the final product were performed with FoodProcess-Lab. For a noninvasive inexpensive ultrasonographic analysis, it is necessary to validate the quantitative assessment of liver fat content so that fully automated reliable computer-aided software can assist medical practitioners without any operator subjectivity. In this study, we attempt to quantify the hepatorenal index difference between the liver and the kidney with respect to the multiple severity status of hepatic steatosis. In order to do this, a series of carefully designed image processing techniques, including fuzzy stretching and edge tracking, are applied to extract regions of interest. Then, an unsupervised neural learning algorithm, the self-organizing map, is designed to establish characteristic clusters from the image, and the distribution of the hepatorenal index values with respect to the different levels of the fatty liver status is experimentally verified to estimate the differences in the distribution of the hepatorenal index. Bilateral Image Subtraction and Multivariate Models for the Automated Triaging of Screening Mammograms Thu, 09 Jul 2015 11:35:14 +0000 Mammography is the most common and effective breast cancer screening test. However, the rate of positive findings is very low, making the radiologic interpretation monotonous and biased toward errors. This work presents a computer-aided diagnosis (CADx) method aimed to automatically triage mammogram sets. The method coregisters the left and right mammograms, extracts image features, and classifies the subjects into risk of having malignant calcifications (CS), malignant masses (MS), and healthy subject (HS). In this study, 449 subjects (197 CS, 207 MS, and 45 HS) from a public database were used to train and evaluate the CADx. Percentile-rank (-rank) and -normalizations were used. Functional and Structural Consequences of Damaging Single Nucleotide Polymorphisms in Human Prostate Cancer Predisposition Gene RNASEL Wed, 08 Jul 2015 09:10:00 +0000 A commonly diagnosed cancer, prostate cancer (PrCa), is being regulated by the gene RNASEL previously known as PRCA1 codes for ribonuclease L which is an integral part of interferon regulated system that mediates antiviral and antiproliferative role of the interferons. Both somatic and germline mutations have been implicated to cause prostate cancer. With an array of available Single Nucleotide Polymorphism data on dbSNP this study is designed to sort out functional SNPs in RNASEL by implementing different authentic computational tools such as SIFT, PolyPhen, SNPs&GO, Fathmm, ConSurf, UTRScan, PDBsum, Tm-Align, I-Mutant, and Project HOPE for functional and structural assessment, solvent accessibility, molecular dynamics, and energy minimization study. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. When searching for the coexpression genes, the data mining process is largely affected by selection of algorithms. Thus, it is highly desirable to provide multiple options of algorithms in the user-friendly analytical toolkit to explore the gene expression signatures. For this purpose, we developed GESearch, an interactive graphical user interface (GUI) toolkit, which is written in MATLAB and supports a variety of gene expression data files. This analytical toolkit provides four models, including the mean, the regression, the delegate, and the ensemble models, to identify the coexpression genes, and enables the users to filter data and to select gene expression patterns by browsing the display window or by importing knowledge-based genes. Subsequently, the utility of this analytical toolkit is demonstrated by analyzing two sets of real-life microarray datasets from cell-cycle experiments. Network-Based Logistic Classification with an Enhanced Solver Reveals Biomarker and Subnetwork Signatures for Diagnosing Lung Cancer Tue, 16 Jun 2015 08:08:23 +0000 Identifying biomarker and signaling pathway is a critical step in genomic studies, in which the regularization method is a widely used feature extraction approach. However, most of the regularizers are based on -norm and their results are not good enough for sparsity and interpretation and are asymptotically biased, especially in genomic research. Recently, we gained a large amount of molecular interaction information about the disease-related biological processes and gathered them through various databases, which focused on many aspects of biological systems. In this paper, we use an enhanced penalized solver to penalize network-constrained logistic regression model called an enhanced net, where the predictors are based on gene-expression data with biologic network knowledge. Extensive simulation studies showed that our proposed approach outperforms regularization, the old penalized solver, and the Elastic net approaches in terms of classification accuracy and stability. Furthermore, we applied our method for lung cancer data analysis and found that our method achieves higher predictive accuracy than regularization, the old penalized solver, and the Elastic net approaches, while fewer but informative biomarkers and pathways are selected. Hai-Hui Huang, Yong Liang, and Xiao-Ying Liu Copyright © 2015 Hai-Hui Huang et al. All rights reserved. The Impact of Normalization Methods on RNA-Seq Data Analysis Mon, 15 Jun 2015 12:14:24 +0000 High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. For this, Alzheimer’s Disease Neuroimaging Initiative database was mined for case-control candidates. At least 652 baseline features extracted from MRI and PET analyses, biological samples, and clinical data up to February 2014 were used. A feature selection methodology that includes a genetic algorithm search coupled to a logistic regression classifier and forward and backward selection strategies was used to explore combinations of features. This generated diagnostic models with sizes ranging from 3 to 8, including well documented AD biomarkers, as well as unexplored image, biochemical, and clinical features. Accuracies of 0.85, 0.79, and 0.80 were achieved for HC-AD, HC-MCI, and MCI-AD classifications, respectively, when evaluated using a blind test set. In conclusion, a set of features provided additional and independent information to well-established AD biomarkers, aiding in the classification of MCI and AD. Antonio Martínez-Torteya, Víctor Treviño, and José G. Meta-analyses showed that rs3746444 had association with CRC risk in Caucasians (OR = 0.57, 95% CI = 0.34–0.95). In the subgroup analysis, we found significant associations between rs2910164 and CRC in hospital based studies (OR = 1.24, 95% CI = 1.03–1.49). rs2292832 may be a high risk factor of CRC in population based studied (OR = 1.18, 95% CI = 1.08–1.38). Conclusion. This meta-analysis showed that rs2910164 and rs2292832 may increase the risk of CRC. However, rs11614913 polymorphism may reduce the risk of CRC. rs3746444 may have a decreased risk to CRC in Caucasians. Xiao-Xu Liu, Meng Wang, Dan Xu, Jian-Hai Yang, Hua-Feng Kang, Xi-Jing Wang, Shuai Lin, Peng-Tao Yang, Xing-Han Liu, and Zhi-Jun Dai Copyright © 2015 Xiao-Xu Liu et al. All rights reserved. Multiblock Discriminant Analysis for Integrative Genomic Study Sun, 17 May 2015 11:45:02 +0000 Human diseases are abnormal medical conditions in which multiple biological components are complicatedly involved. Nevertheless, most contributions of research have been made with a single type of genetic data such as Single Nucleotide Polymorphism (SNP) or Copy Number Variation (CNV). Furthermore, epigenetic modifications and transcriptional regulations have to be considered to fully exploit the knowledge of the complex human diseases as well as the genomic variants. We call the collection of the multiple heterogeneous data “multiblock data.” In this paper, we propose a novel Multiblock Discriminant Analysis (MultiDA) method that provides a new integrative genomic model for the multiblock analysis and an efficient algorithm for discriminant analysis. The integrative genomic model is built by exploiting the representative genomic data including SNP, CNV, DNA methylation, and gene expression. The efficient algorithm for the discriminant analysis identifies discriminative factors of the multiblock data. The discriminant analysis is essential to discover biomarkers in computational biology. The present study identifies caspase-3 as a common player involved in the regulation of multineurodegenerative disorders, namely, Alzheimer’s disease (AD), Parkinson’s disease (PD), Huntington’s disease (HD), and amyotrophic lateral sclerosis (ALS). The protein interaction network prepared using STRING database provides a strong evidence of caspase-3 interactions with the metabolic cascade of the said multineurodegenerative disorders, thus characterizing it as a potential therapeutic target for multiple neurodegenerative disorders. In silico molecular docking of selected nonpeptidyl natural compounds against caspase-3 exposed potent leads against this common therapeutic target. Rosmarinic acid and curcumin proved to be the most promising ligands (leads) mimicking the inhibitory action of peptidyl inhibitors with the highest Gold fitness scores 57.38 and 53.51, respectively. Therefore, pinpointing the molecular mechanisms underlying the pathogenesis of OLP is important to develop efficient treatments for OLP. Recently, the accumulation of the large amount of omics data, especially transcriptome data, provides opportunities to investigate OLPs from a systematic perspective. In this paper, assuming that the OLP associated genes have functional relationships, we present a new approach to identify OLP related gene modules from gene regulatory networks. In particular, we find that the gene modules regulated by both transcription factors (TFs) and microRNAs (miRNAs) play important roles in the pathogenesis of OLP and many genes in the modules have been reported to be related to OLP in the literature. Yu-Ling Zuo, Di-Ping Gong, Bi-Ze Li, Juan Zhao, Ling-Yue Zhou, Fang-Yang Shao, Zhao Jin, and Yuan He Copyright © 2015 Yu-Ling Zuo et al. All rights reserved. Prediction of Metabolic Gene Biomarkers for Neurodegenerative Disease by an Integrated Network-Based Approach Sun, 03 May 2015 11:35:44 +0000 Neurodegenerative diseases (NDs), such as Parkinson’s disease (PD) and Huntington’s disease (HD), have become more and more common among aged people worldwide. One hallmark of NDs is the presence of intracellular accumulation of specific pathogenic proteins that may result from abnormal function of metabolic processes. Previously, we have developed a computational method named Met-express that predicted key enzyme-coding genes in cancer development by integrating cancer gene coexpression network with the metabolic network. Here, we applied Met-express to predict key enzyme-coding genes in both PD and HD. Functional enrichment analysis and literature review of predicted genes suggested that there might be some common pathogenic metabolic pathways for PD and HD. Cell penetrating peptides (CPPs) are short-chain peptides developed as functionalized vectors for delivery approaches of impermeable agents. On cell surface negatively charged HS provides the initial attachment of basic CPPs by electrostatic interaction, leading to multiple cellular effects. Here a functional peptide (CPPecp) has been identified from critical HS binding region in hRNase3, a unique RNase family member with in vitro antitumor activity. In this study we analyze a set of HS-binding CPPs derived from natural proteins including CPPecp. In addition to cellular binding and internalization, CPPecp demonstrated multiple functions including strong binding activity to tumor cell surface with higher HS expression, significant inhibitory effects on cancer cell migration, and suppression of angiogenesis in vitro and in vivo. Moreover, different from conventional highly basic CPPs, CPPecp facilitated magnetic nanoparticle to selectively target tumor site in vivo. We thus present A Database of Anti-Microbial peptides (ADAM), which contains 7,007 unique sequences and 759 structures, to systematically establish comprehensive associations between AMP sequences and structures through structural folds and to provide an easy access to view their relationships. 30 distinct AMP structural fold clusters with more than one structure are detected and about a thousand AMPs are associated with at least one structural fold cluster. According to ADAM, AMP structural folds are limited—AMPs only cover about 3% of the overall protein fold space. Hao-Ting Lee, Chen-Che Lee, Je-Ruei Yang, Jim Z. C. Lai, and Kuan Y. Chang Copyright © 2015 Hao-Ting Lee et al. All rights reserved. Predicting Flavin and Nicotinamide Adenine Dinucleotide-Binding Sites in Proteins Using the Fragment Transformation Method Mon, 27 Apr 2015 11:48:34 +0000 We developed a computational method to identify NAD- and FAD-binding sites in proteins. First, we extracted from the Protein Data Bank structures of proteins that bind to at least one of these ligands. NAD-/FAD-binding residue templates were then constructed by identifying binding residues through the ligand-binding database BioLiP. The fragment transformation method was used to identify structures within query proteins that resembled the ligand-binding templates. By comparing residue types and their relative spatial positions, potential binding sites were identified and a ligand-binding potential for each residue was calculated. Setting the false positive rate at 5%, our method predicted NAD- and FAD-binding sites at true positive rates of 67.1% and 68.4%, respectively. Our method provides excellent results for identifying FAD- and NAD-binding sites in proteins, and the most important is that the requirement of conservation of residue types and local structures in the FAD- and NAD-binding sites can be verified. High Order Gene-Gene Interactions in Eight Single Nucleotide Polymorphisms of Renin-Angiotensin System Genes for Hypertension Association Study Sun, 19 Apr 2015 09:58:13 +0000 Several single nucleotide polymorphisms (SNPs) of renin-angiotensin system (RAS) genes are associated with hypertension (HT) but most of them are focusing on single locus effects. Here, we introduce an unbalanced function based on multifactor dimensionality reduction (MDR) for multiloci genotypes to detect high order gene-gene (SNP-SNP) interaction in unbalanced cases and controls of HT data. Eight SNPs of three RAS genes (angiotensinogen, AGT; angiotensin-converting enzyme, ACE; angiotensin II type 1 receptor, AT1R) in HT and non-HT subjects were included that showed no significant genotype differences. In 2- to 6-locus models of the SNP-SNP interaction, the SNPs of AGT and ACE genes were associated with hypertension (bootstrapping odds ratio [Boot-OR] = 1.972~3.785; 95%, confidence interval (CI) 1.26~6.21; ). The electric field distribution affects the results of single cell impedance measurements whereas the electrode geometry affects the electric field distributions. Therefore, this study obtained numerical solutions by using the COMSOL multiphysics package to perform FEM simulations of the effects of electrode geometry on microfluidic devices. An equivalent circuit model incorporating the PBS solution, a pair of electrodes, and a cell is used to obtain the impedance of a single HeLa cell. Simulations indicated that the circle and parallel electrodes provide higher electric field strength compared to cross and standard electrodes at the same operating voltage. Additionally, increasing the operating voltage reduces the impedance magnitude of a single HeLa cell in all electrode shapes. Decreasing impedance magnitude of the single HeLa cell increases measurement sensitivity, but higher operational voltage will damage single HeLa cell. The conserved domains of HOLI (ligand binding domain of hormone receptors) domain and ZnF_C4 (C4 zinc finger in nuclear in hormone receptors) are essential for keeping basic roles of PPAR gene family, and the variant domains of LCRs may be responsible for their divergence in functions. The positive selection sites in HOLI domain are benefit for PPARs to evolve towards diversity functions. The evolutionary variants in the promoter regions and 3′ UTR regions of PPARs result into differential transcription factors and miRNAs involved in regulating PPAR members, which may eventually affect their expressions and tissues distributions. These results indicate that gene duplication event, selection pressure on HOLI domain, and the variants on promoter and 3′ UTR are essential for PPARs evolution and diversity functions acquired. Tianyu Zhou, Xiping Yan, Guosong Wang, Hehe Liu, Xiang Gan, Tao Zhang, Jiwen Wang, and Liang Li Copyright © 2015 Tianyu Zhou et al. All rights reserved. Relationship between Hyperuricemia and Haar-Like Features on Tongue Images Wed, 15 Apr 2015 13:36:54 +0000 Objective. To investigate differences in tongue images of subjects with and without hyperuricemia. Materials and Methods. This population-based case-control study was performed in 2012-2013. We collected data from 46 case subjects with hyperuricemia and 46 control subjects, including results of biochemical examinations and tongue images. Symmetrical Haar-like features based on integral images were extracted from tongue images. T-tests were performed to determine the ability of extracted features to distinguish between the case and control groups. We first selected features using the common criterion , then conducted further examination of feature characteristics and feature selection using means and standard deviations of distributions in the case and control groups. Results. A total of 115,683 features were selected using the criterion . In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). Currently, comparative genomic hybridization arrays (aCGH) techniques have been developed rapidly, and recent evidences in studies of breast cancer suggest that tumors within gene expression subtypes share similar DNA copy number aberrations (CNA) which can be used to further subdivide subtypes. Moreover, subtype-specific miRNA expression profiles are also proposed as novel signatures for breast cancer classification. The identification of mRNA or miRNA expression-based breast cancer subtypes is considered an instructive means of prognosis. Here, we conducted an integrated analysis based on copy number aberrations data and miRNA-mRNA dual expression profiling data to identify breast cancer subtype-specific biomarkers. Interestingly, we found a group of genes residing in subtype-specific CNA regions that also display the corresponding changes in mRNAs levels and their target miRNAs’ expression. Therefore, investigating the pathogenesis of HPV16 is very important for public health. Protein-protein interaction (PPI) network between HPV16 and human was used as a measure to improve our understanding of its pathogenesis. By adopting sequence and topological features, a support vector machine (SVM) model was built to predict new interactions between HPV16 and human proteins. All interactions were comprehensively investigated and analyzed. The analysis indicated that HPV16 enlarged its scope of influence by interacting with human proteins as much as possible. These interactions alter a broad array of cell cycle progression. Furthermore, not only was HPV16 highly prone to interact with hub proteins and bottleneck proteins, but also it could effectively affect a breadth of signaling pathways. In addition, we found that the HPV16 evolved into high carcinogenicity on the condition that its own reproduction had been ensured. We presented a distribution-based approach for gene pair classification by identifying a disease-specific cutoff point that classified the coexpressed gene pairs into strong and weak coexpression structures. The differences in the coexpression patterns between the normal and the CML groups were reflected from the overall structure by performing two-sample Kolmogorov-Smirnov test. Our developed method effectively identified the coexpression pattern differences from the overall structure: for the maximum deviation . Moreover, we found that genes involved in the ribosomal synthesis and translation process tended to be coexpressed in the CML group. Conclusion. Our developed method can identify the coexpression difference between two different groups. Dysregulation of ribosomal synthesis and translation process may be related to the CML disease. Our significant findings may provide useful information for the novel CML mechanism exploration and cancer treatment. Fengfeng Wang, Lawrence W. Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction Wed, 15 Apr 2015 08:27:29 +0000 It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Acid Index database plus traditional orthogonal encoding features are used in this paper, taking both physiochemical and sequence information into consideration. Results of feature selection prove that , , , and are the most important positions. Two feature fusion methods are used in this paper: combination fusion and decision fusion aiming to get comprehensive feature representation and improve prediction performance. Reverse vaccinology uses the entire proteome of a pathogen to select the best vaccine antigens by in silico approaches. M. tuberculosis H37Rv proteome was analyzed with NERVE (New Enhanced Reverse Vaccinology Environment) prediction software to identify potential vaccine targets; these 331 proteins were further analyzed with VaxiJen for the determination of their antigenicity value. Only candidates with values ≥0.5 of antigenicity and 50% of adhesin probability and without homology with human proteins or transmembrane regions were selected, resulting in 73 antigens. These proteins were grouped by families in seven groups and analyzed by amino acid sequence alignments, selecting 16 representative proteins. For each candidate, a search of the literature and protein analysis with different bioinformatics tools, as well as a simulation of the immune response, was conducted. We have recently reported the first miRNome analysis of exosomes secreted from Nef-expressing U937monocytic cells. Here we show genome-wide transcriptome analysis of Nef-expressing U937 cells and their exosomes. We identified four key mRNAs preferentially retained in Nef-expressing cells; these code for MECP2, HMOX1, AARSD1, and ATF2 and are important for chromatin modification and gene expression. Interestingly, their target miRNAs are exported out in exosomes. We also identified three key mRNAs selectively secreted in exosomes from Nef-expressing U937 cells and their corresponding miRNAs being preferentially retained in cells. These are AATK, SLC27A1, and CDKAL and are important in apoptosis and fatty acid transport. Thus, our study identifies selectively expressed mRNAs in Nef-expressing U937 cells and their exosomes and supports a new mode on intercellular regulation by the HIV-1 Nef protein. A Gas Chromatography-Mass Spectrometry Based Study on Urine Metabolomics in Rats Chronically Poisoned with Hydrogen Sulfide Tue, 14 Apr 2015 17:02:10 +0000 Gas chromatography-mass spectrometry (GS-MS) in combination with multivariate statistical analysis was applied to explore the metabolic variability in urine of chronically hydrogen sulfide- (H2S-) poisoned rats relative to control ones. The changes in endogenous metabolites were studied by partial least squares-discriminate analysis (PLS-DA) and independent-samples t-test. The metabolic patterns of H2S-poisoned group are separated from the control, suggesting that the metabolic profiles of H2S-poisoned rats were markedly different from the controls. An Integrated Modeling and Experimental Approach to Study the Influence of Environmental Nutrients on Biofilm Formation of Pseudomonas aeruginosa Tue, 14 Apr 2015 16:59:46 +0000 The availability of nutrient components in the environment was identified as a critical regulator of virulence and biofilm formation in Pseudomonas aeruginosa. This work proposes the first systems-biology approach to quantify microbial biofilm formation upon the change of nutrient availability in the environment. Specifically, the change of fluxes of metabolic reactions that were positively associated with P. aeruginosa biofilm formation was used to monitor the trend for P. aeruginosa to form a biofilm. The uptake rates of nutrient components were changed according to the change of the nutrient availability. We found that adding each of the eleven amino acids (Arg, Tyr, Phe, His, Iso, Orn, Pro, Glu, Leu, Val, and Asp) to minimal medium promoted P. aeruginosa biofilm formation. Both modeling and experimental approaches were further developed to quantify P. aeruginosa biofilm formation for four different availability levels for each of the three ions that include ferrous ions, sulfate, and phosphate. The developed modeling approach correctly predicted the amount of biofilm formation. By comparing reaction flux change upon the change of nutrient concentrations, metabolic reactions used by P. aeruginosa to regulate its biofilm formation are mainly involved in arginine metabolism, glutamate production, magnesium transport, acetate metabolism, and the TCA cycle. Zhaobin Xu, Sabina Islam, Thomas K. Wood, and Zuyi Huang Copyright © 2015 Zhaobin Xu et al. All rights reserved. Computer-Simulated Biopsy Marking System for Endoscopic Surveillance of Gastric Lesions: A Pilot Study Tue, 14 Apr 2015 16:57:39 +0000 Endoscopic tattoo with India ink injection for surveillance of premalignant gastric lesions is technically cumbersome and may not be durable. The aim of the study is to evaluate the accuracy of a novel, computer-simulated biopsy marking system (CSBMS) developed for the endoscopic marking of gastric lesions. Twenty-five patients with history of gastric intestinal metaplasia received both CSBMS-guided marking and India ink injection in five points in the stomach at index endoscopy. A second endoscopy was performed at three months. Primary outcome was accuracy of CSBMS (distance between CSBMS probe-guided site and tattoo site measured by CSBMS). The mean accuracy of CSBMS at angularis was  mm, antral lesser curvature  mm, antral greater curvature  mm, antral anterior wall  mm, and antral posterior wall  mm. CSBMS ( versus seconds; ) required less procedure time compared to endoscopic tattooing. No adverse events were encountered. CSBMS accurately identified previously marked gastric sites by endoscopic tattooing within 1 cm on follow-up endoscopy. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model. Andreas Friedrich, Erhan Kenar, Oliver Kohlbacher, and Sven Nahnsen Copyright © 2015 Andreas Friedrich et al. All rights reserved. A Method for Generating New Datasets Based on Copy Number for Cancer Analysis Wed, 08 Apr 2015 12:22:40 +0000 New data sources for the analysis of cancer data are rapidly supplementing the large number of gene-expression markers used for current methods of analysis. Significant among these new sources are copy number variation (CNV) datasets, which typically enumerate several hundred thousand CNVs distributed throughout the genome. Several useful algorithms allow systems-level analyses of such datasets. However, these rich data sources have not yet been analyzed as deeply as gene-expression data. To address this issue, the extensive toolsets used for analyzing expression data in cancerous and noncancerous tissue (e.g., gene set enrichment analysis and phenotype prediction) could be redirected to extract a great deal of predictive information from CNV data, in particular those derived from cancers. This wide range of methods and a diversity of file formats used in sequence analysis is a significant issue, with a considerable amount of time spent before anyone can even attempt to analyse the genetic basis of human disorders. Another point to consider that is although many possess “just enough” knowledge to analyse their data, they do not make full use of the tools and databases that are available and also do not fully understand how their data was created. The primary aim of this review is to document some of the key approaches and provide an analysis schema to make the analysis process more efficient and reliable in the context of discovering highly penetrant causal mutations/genes. This review will also compare the methods used to identify highly penetrant variants when data is obtained from consanguineous individuals as opposed to nonconsanguineous; and when Mendelian disorders are analysed as opposed to common-complex disorders. A. In Silico Search of Energy Metabolism Inhibitors for Alternative Leishmaniasis Treatments Mon, 30 Mar 2015 13:56:04 +0000 Leishmaniasis is a complex disease that affects mammals and is caused by approximately 20 distinct protozoa from the genus Leishmania. Leishmaniasis is an endemic disease that exerts a large socioeconomic impact on poor and developing countries. The current treatment for leishmaniasis is complex, expensive, and poorly efficacious. Thus, there is an urgent need to develop more selective, less expensive new drugs. The energy metabolism pathways of Leishmania include several interesting targets for specific inhibitors. In the present study, we sought to establish which energy metabolism enzymes in Leishmania could be targets for inhibitors that have already been approved for the treatment of other diseases. We were able to identify 94 genes and 93 Leishmania energy metabolism targets. However, the traditional sample preparation protocol for NGS is non-strand-specific (NSS), leading to biased estimates of expression for transcripts overlapped at the antisense strand. Strand-specific (SS) protocols have recently been developed. In this study, we prepared the same RNA sample by using the SS and NSS protocols, followed by sequencing with Illumina HiSeq platform. Using real-time quantitative PCR as a standard, we first proved that the SS protocol more precisely estimates gene expressions compared with the NSS protocol, particularly for those overlapped at the antisense strand. In addition, we also showed that the sequence reads from the SS protocol are comparable with those from conventional NSS protocols in many aspects. Finally, we also mapped a fraction of sequence reads back to the antisense strand of the known genes, originally without annotated genes located. A Network Flow Approach to Predict Protein Targets and Flavonoid Backbones to Treat Respiratory Syncytial Virus Infection Mon, 23 Mar 2015 06:08:56 +0000 Background. Respiratory syncytial virus (RSV) infection is the major cause of respiratory disease in lower respiratory tract in infants and young children. Attempts to develop effective vaccines or pharmacological treatments to inhibit RSV infection without undesired effects on human health have been unsuccessful. However, RSV infection has been reported to be affected by flavonoids. The mechanisms underlying viral inhibition induced by these compounds are largely unknown, making the development of new drugs difficult. Methods. To understand the mechanisms induced by flavonoids to inhibit RSV infection, a systems pharmacology-based study was performed using microarray data from primary culture of human bronchial cells infected by RSV, together with compound-proteomic interaction data available for Homo sapiens. Results. Identification of Novel Thyroid Cancer-Related Genes and Chemicals Using Shortest Path Algorithm Sun, 22 Mar 2015 11:26:51 +0000 Thyroid cancer is a typical endocrine malignancy. In the past three decades, the continued growth of its incidence has made it urgent to design effective treatments to treat this disease. To this end, it is necessary to uncover the mechanism underlying this disease. Identification of thyroid cancer-related genes and chemicals is helpful to understand the mechanism of thyroid cancer. In this study, we generalized some previous methods to discover both disease genes and chemicals. The method was based on shortest path algorithm and applied to discover novel thyroid cancer-related genes and chemicals. The analysis of the final obtained genes and chemicals suggests that some of them are crucial to the formation and development of thyroid cancer. It is indicated that the proposed method is effective for the discovery of novel disease genes and chemicals. Agent-Based Spatiotemporal Simulation of Biomolecular Systems within the Open Source MASON Framework Sun, 22 Mar 2015 10:04:24 +0000 Agent-based modelling is being used to represent biological systems with increasing frequency and success. This paper presents the implementation of a new tool for biomolecular reaction modelling in the open source Multiagent Simulator of Neighborhoods framework. The rationale behind this new tool is the necessity to describe interactions at the molecular level to be able to grasp emergent and meaningful biological behaviour. We are particularly interested in characterising and quantifying the various effects that facilitate biocatalysis. Enzymes may display high specificity for their substrates and this information is crucial to the engineering and optimisation of bioprocesses. Using the eServices Platform for Detecting Behavior Patterns Deviation in the Elderly Assisted Living: A Case Study Sun, 22 Mar 2015 09:33:56 +0000 World’s aging population is rising and the elderly are increasingly isolated socially and geographically. As a consequence, in many situations, they need assistance that is not granted in time. In this paper, we present a solution that follows the CRISP-DM methodology to detect the elderly’s behavior pattern deviations that may indicate possible risk situations. To obtain these patterns, many variables are aggregated to ensure the alert system reliability and minimize eventual false positive alert situations. These variables comprehend information provided by body area network (BAN), by environment sensors, and also by the elderly’s interaction in a service provider platform, called eServices—Elderly Support Service Platform. eServices is a scalable platform aggregating a service ecosystem developed specially for elderly people. A Distributed Multiagent System Architecture for Body Area Networks Applied to Healthcare Monitoring Sun, 22 Mar 2015 09:23:02 +0000 In the last years the area of health monitoring has grown significantly, attracting the attention of both academia and commercial sectors. At the same time, the availability of new biomedical sensors and suitable network protocols has led to the appearance of a new generation of wireless sensor networks, the so-called wireless body area networks. Nowadays, these networks are routinely used for continuous monitoring of vital parameters, movement, and the surrounding environment of people, but the large volume of data generated in different locations represents a major obstacle for the appropriate design, development, and deployment of more elaborated intelligent systems. RecRWR: A Recursive Random Walk Method for Improved Identification of Diseases Sun, 22 Mar 2015 09:18:34 +0000 High-throughput methods such as next-generation sequencing or DNA microarrays lack precision, as they return hundreds of genes for a single disease profile. Several computational methods applied to physical interaction of protein networks have been successfully used in identification of the best disease candidates for each expression profile. An open problem for these methods is the ability to combine and take advantage of the wealth of biomedical data publicly available. We propose an enhanced method to improve selection of the best disease targets for a multilayer biomedical network that integrates PPI data annotated with stable knowledge from OMIM diseases and GO biological processes. We present a comprehensive validation that demonstrates the advantage of the proposed approach, Recursive Random Walk with Restarts (RecRWR). We evaluate various pairwise kernels to establish which are most informative and compare individual kernel accuracies with accuracies for weighted combinations. By associating a probability measure with classifier predictions, we enable cautious classification, which can increase accuracy by restricting predictions to high-confidence instances, and data cleaning that can mitigate the influence of mislabeled training instances. Although one pairwise kernel (the tensor product pairwise kernel) appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast) reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy. Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%. Mark F. Rogers, Colin Campbell, and Yiming Ying Copyright © 2015 Mark F. Rogers et al. All rights reserved. aCGH-MAS: Analysis of aCGH by means of Multiagent System Sun, 22 Mar 2015 08:55:39 +0000 There are currently different techniques, such as CGH arrays, to study genetic variations in patients. CGH arrays analyze gains and losses in different regions in the chromosome. Regions with gains or losses in pathologies are important for selecting relevant genes or CNVs (copy-number variations) associated with the variations detected within chromosomes. Information corresponding to mutations, genes, proteins, variations, CNVs, and diseases can be found in different databases and it would be of interest to incorporate information of different sources to extract relevant information. This work proposes a multiagent system to manage the information of aCGH arrays, with the aim of providing an intuitive and extensible system to analyze and interpret the results. However, the complexities of metabolic networks have made the process of identifying the effects of genetic modification on desirable phenotypes challenging. Furthermore, a vast number of reactions in cellular metabolism often lead to a combinatorial problem in obtaining optimal gene knockout. The computational time increases exponentially as the size of the problem increases. This work reports an extension of Bees Hill Flux Balance Analysis (BHFBA) to identify optimal gene knockouts to maximise the production yield of desired phenotypes while sustaining the growth rate. This proposed method functions by integrating OptKnock into BHFBA for validating the results automatically. The results show that the extension of BHFBA is suitable, reliable, and applicable in predicting gene knockout. Over the last 100 years, the most commonly used material has been silver amalgam, which, while very durable, is somewhat aesthetically displeasing. Our study is based on the collection of data from the charts, notes, and radiographic information of restorative treatments performed by Dr. Vera in 1993, the analysis of the information by computer artificial intelligence to determine the most appropriate restoration, and the monitoring of the evolution of the dental restoration. The data will be treated confidentially according to the Organic Law 15/1999 on 13 December on the Protection of Personal Data. This paper also presents a clustering technique capable of identifying the most significant cases with which to instantiate the case-base. In order to classify the cases, a mixture of experts is used which incorporates a Bayesian network and a multilayer perceptron; the combination of both classifiers is performed with a neural network. Ignacio J. Aliaga, Vicente Vera, Juan F. Clinical history was obtained by means of anamnesis and physical examination, and preoperative imaging and urine cytology were carried out for all patients. Then, patients underwent conventional transurethral resection (TURBT) and some proteomic analyses quantified the biomarkers (p53, neu, and EGFR). A postoperative follow-up was performed to detect relapse and progression. Clusterings were performed to find groups with clinical, molecular markers, histopathological prognostic factors, and statistics about recurrence, progression, and overall survival of patients with NMIBC. Four groups were found according to tumor sizes, risk of relapse or progression, and biological behavior. Outlier patients were also detected and categorized according to their clinical characters and biological behavior. 