Advances in Bioinformatics The latest articles from Hindawi Publishing Corporation © 2016 , Hindawi Publishing Corporation . All rights reserved. Evaluation of Bioinformatic Programmes for the Analysis of Variants within Splice Site Consensus Regions Tue, 24 May 2016 11:20:45 +0000 The increasing diagnostic use of gene sequencing has led to an expanding dataset of novel variants that lie within consensus splice junctions. The challenge for diagnostic laboratories is the evaluation of these variants in order to determine if they affect splicing or are merely benign. A common evaluation strategy is to use in silico analysis, and it is here that a number of programmes are available online; however, currently, there are no consensus guidelines on the selection of programmes or protocols to interpret the prediction results. Using a collection of 222 pathogenic mutations and 50 benign polymorphisms, we evaluated the sensitivity and specificity of four in silico programmes in predicting the effect of each variant on splicing. The programmes comprised Human Splice Finder (HSF), Max Entropy Scan (MES), NNSplice, and ASSP. The MES and ASSP programmes gave the highest performance based on Receiver Operator Curve analysis, with an optimal cut-off of score reduction of 10%. The study also showed that the sensitivity of prediction is affected by the level of conservation of individual positions, with in silico predictions for variants at positions 4 and +7 within consensus splice sites being largely uninformative. Rongying Tang, Debra O. Prosser, and Donald R. Love Copyright © 2016 Rongying Tang et al. All rights reserved. A Support Vector Machine Classification of Thyroid Bioptic Specimens Using MALDI-MSI Data Tue, 17 May 2016 13:58:40 +0000 Biomarkers able to characterise and predict multifactorial diseases are still one of the most important targets for all the “omics” investigations. In this context, Matrix-Assisted Laser Desorption/Ionisation-Mass Spectrometry Imaging (MALDI-MSI) has gained considerable attention in recent years, but it also led to a huge amount of complex data to be elaborated and interpreted. For this reason, computational and machine learning procedures for biomarker discovery are important tools to consider, both to reduce data dimension and to provide predictive markers for specific diseases. For instance, the availability of protein and genetic markers to support thyroid lesion diagnoses would impact deeply on society due to the high presence of undetermined reports (THY3) that are generally treated as malignant patients. In this paper we show how an accurate classification of thyroid bioptic specimens can be obtained through the application of a state-of-the-art machine learning approach (i.e., Support Vector Machines) on MALDI-MSI data, together with a particular wrapper feature selection algorithm (i.e., recursive feature elimination). The model is able to provide an accurate discriminatory capability using only 20 out of 144 features, resulting in an increase of the model performances, reliability, and computational efficiency. Finally, tissue areas rather than average proteomic profiles are classified, highlighting potential discriminating areas of clinical interest. Manuel Galli, Italo Zoppis, Gabriele De Sio, Clizia Chinello, Fabio Pagni, Fulvio Magni, and Giancarlo Mauri Copyright © 2016 Manuel Galli et al. All rights reserved. Expressing Redundancy among Linear-Epitope Sequence Data Based on Residue-Level Physicochemical Similarity in the Context of Antigenic Cross-Reaction Wed, 04 May 2016 09:57:04 +0000 Epitope-based design of vaccines, immunotherapeutics, and immunodiagnostics is complicated by structural changes that radically alter immunological outcomes. This is obscured by expressing redundancy among linear-epitope data as fractional sequence-alignment identity, which fails to account for potentially drastic loss of binding affinity due to single-residue substitutions even where these might be considered conservative in the context of classical sequence analysis. From the perspective of immune function based on molecular recognition of epitopes, functional redundancy of epitope data (FRED) thus may be defined in a biologically more meaningful way based on residue-level physicochemical similarity in the context of antigenic cross-reaction, with functional similarity between epitopes expressed as the Shannon information entropy for differential epitope binding. Such similarity may be estimated in terms of structural differences between an immunogen epitope and an antigen epitope with reference to an idealized binding site of high complementarity to the immunogen epitope, by analogy between protein folding and ligand-receptor binding; but this underestimates potential for cross-reactivity, suggesting that epitope-binding site complementarity is typically suboptimal as regards immunologic specificity. The apparently suboptimal complementarity may reflect a tradeoff to attain optimal immune function that favors generation of immune-system components each having potential for cross-reactivity with a variety of epitopes. Salvador Eugenio C. Caoili Copyright © 2016 Salvador Eugenio C. Caoili. All rights reserved. Ebolavirus Database: Gene and Protein Information Resource for Ebolaviruses Thu, 14 Apr 2016 11:54:54 +0000 Ebola Virus Disease (EVD) is a life-threatening haemorrhagic fever in humans. Even though there are many reports on EVD, the protein precursor functions and virulent factors of ebolaviruses remain poorly understood. Comparative analyses of Ebolavirus genomes will help in the identification of these important features. This prompted us to develop the Ebolavirus Database (EDB) and we have provided links to various tools that will aid researchers to locate important regions in both the genomes and proteomes of Ebolavirus. The genomic analyses of ebolaviruses will provide important clues for locating the essential and core functional genes. The aim of EDB is to act as an integrated resource for ebolaviruses and we strongly believe that the database will be a useful tool for clinicians, microbiologists, health care workers, and bioscience researchers. Rayapadi G. Swetha, Sudha Ramaiah, Anand Anbarasu, and Kanagaraj Sekar Copyright © 2016 Rayapadi G. Swetha et al. All rights reserved. Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants Tue, 12 Apr 2016 14:00:26 +0000 MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection. Malik Yousef, Müşerref Duygu Saçar Demirci, Waleed Khalifa, and Jens Allmer Copyright © 2016 Malik Yousef et al. All rights reserved. Molecular Docking and In Silico ADMET Study Reveals Acylguanidine 7a as a Potential Inhibitor of β-Secretase Sun, 10 Apr 2016 09:49:34 +0000 Amyloidogenic pathway in Alzheimer’s disease (AD) involves breakdown of APP by β-secretase followed by γ-secretase and results in formation of amyloid beta plaque. β-secretase has been a promising target for developing novel anti-Alzheimer drugs. To test different molecules for this purpose, test ligands like acylguanidine 7a, rosiglitazone, pioglitazone, and tartaric acid were docked against our target protein β-secretase enzyme retrieved from Protein Data Bank, considering MK-8931 (phase III trial, Merck) as the positive control. Docking revealed that, with respect to their free binding energy, acylguanidine 7a has the lowest binding energy followed by MK-8931 and pioglitazone and binds significantly to β-secretase. In silico ADMET predictions revealed that except tartaric acid all other compounds had minimal toxic effects and had good absorption as well as solubility characteristics. These compounds may serve as potential lead compound for developing new anti-Alzheimer drug. Chaluveelaveedu Murleedharan Nisha, Ashwini Kumar, Prateek Nair, Nityasha Gupta, Chitrangda Silakari, Timir Tripathi, and Awanish Kumar Copyright © 2016 Chaluveelaveedu Murleedharan Nisha et al. All rights reserved. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information Sun, 20 Mar 2016 09:47:50 +0000 High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches. Atiyeh Mortazavi and Mohammad Hossein Moattar Copyright © 2016 Atiyeh Mortazavi and Mohammad Hossein Moattar. All rights reserved. Efficacy and Toxicity Assessment of Different Antibody Based Antiangiogenic Drugs by Computational Docking Method Mon, 07 Mar 2016 12:07:42 +0000 Bevacizumab and trastuzumab are two antibody based antiangiogenic drugs that are in clinical practice for the treatment of different cancers. Presently applications of these drugs are based on the empirical choice of clinical experts that follow towards population based clinical trials and, hence, their molecular efficacies in terms of quantitative estimates are not being explored. Moreover, different clinical trials with these drugs showed different toxicity symptoms in patients. Here, using molecular docking study, we made an attempt to reveal the molecular rationale regarding their efficacy and off-target toxicity. Though our study reinforces their antiangiogenic potentiality and, among the two, trastuzumab has much higher efficacy; however, this study also reveals that compared to bevacizumab, trastuzumab has higher toxicity effect, specially on the cardiovascular system. This study also reveals the molecular rationale of ocular dysfunction by antiangiogenic drugs. The molecular rationale of toxicity as revealed in this study may help in the judicious choice as well as therapeutic scheduling of these drugs in different cancers. Sayan Mukherjee, Gopa Chatterjee, Moumita Ghosh, Bishwajit Das, and Durjoy Majumder Copyright © 2016 Sayan Mukherjee et al. All rights reserved. Structural Dynamics of Human Argonaute2 and Its Interaction with siRNAs Designed to Target Mutant tdp43 Sun, 06 Mar 2016 12:52:41 +0000 The human Argonaute2 protein (Ago2) is a key player in RNA interference pathway and small RNA recognition by Ago2 is the crucial step in siRNA mediated gene silencing mechanism. The present study highlights the structural and functional dynamics of human Ago2 and the interaction mechanism of Ago2 with a set of seven siRNAs for the first time. The human Ago2 protein adopts two conformations such as “open” and “close” during the simulation of 25 ns. One of the domains named as PAZ, which is responsible for anchoring the 3′-end of siRNA guide strand, is observed as a highly flexible region. The interaction between Ago2 and siRNA, analyzed using a set of siRNAs (targeting at positions 128, 251, 341, 383, 537, 1113, and 1115 of mRNA) designed to target tdp43 mutants causing Amyotrophic Lateral Sclerosis (ALS) disease, revealed the stable and strong recognition of siRNA by the Ago2 protein during dynamics. Among the studied siRNAs, the siRNA341 is identified as a potent siRNA to recognize Ago2 and hence could be used further as a possible siRNA candidate to target the mutant tdp43 protein for the treatment of ALS patients. Vishwambhar Bhandare and Amutha Ramaswamy Copyright © 2016 Vishwambhar Bhandare and Amutha Ramaswamy. All rights reserved. Random versus Deterministic Descent in RNA Energy Landscape Analysis Wed, 02 Mar 2016 06:49:32 +0000 Identifying sets of metastable conformations is a major research topic in RNA energy landscape analysis, and recently several methods have been proposed for finding local minima in landscapes spawned by RNA secondary structures. An important and time-critical component of such methods is steepest, or gradient, descent in attraction basins of local minima. We analyse the speed-up achievable by randomised descent in attraction basins in the context of large sample sets where the size has an order of magnitude in the region of ~106. While the gain for each individual sample might be marginal, the overall run-time improvement can be significant. Moreover, for the two nongradient methods we analysed for partial energy landscapes induced by ten different RNA sequences, we obtained that the number of observed local minima is on average larger by 7.3% and 3.5%, respectively. The run-time improvement is approximately 16.6% and 6.8% on average over the ten partial energy landscapes. For the large sample size we selected for descent procedures, the coverage of local minima is very high up to energy values of the region where the samples were randomly selected from the partial energy landscapes; that is, the difference to the total set of local minima is mainly due to the upper area of the energy landscapes. Luke Day, Ouala Abdelhadi Ep Souki, Andreas A. Albrecht, and Kathleen Steinhöfel Copyright © 2016 Luke Day et al. All rights reserved. BacHbpred: Support Vector Machine Methods for the Prediction of Bacterial Hemoglobin-Like Proteins Mon, 29 Feb 2016 17:23:34 +0000 The recent upsurge in microbial genome data has revealed that hemoglobin-like (HbL) proteins may be widely distributed among bacteria and that some organisms may carry more than one HbL encoding gene. However, the discovery of HbL proteins has been limited to a small number of bacteria only. This study describes the prediction of HbL proteins and their domain classification using a machine learning approach. Support vector machine (SVM) models were developed for predicting HbL proteins based upon amino acid composition (AC), dipeptide composition (DC), hybrid method (AC + DC), and position specific scoring matrix (PSSM). In addition, we introduce for the first time a new prediction method based on max to min amino acid residue (MM) profiles. The average accuracy, standard deviation (SD), false positive rate (FPR), confusion matrix, and receiver operating characteristic (ROC) were analyzed. We also compared the performance of our proposed models in homology detection databases. The performance of the different approaches was estimated using fivefold cross-validation techniques. Prediction accuracy was further investigated through confusion matrix and ROC curve analysis. All experimental results indicate that the proposed BacHbpred can be a perspective predictor for determination of HbL related proteins. BacHbpred, a web tool, has been developed for HbL prediction. MuthuKrishnan Selvaraj, Munish Puri, Kanak L. Dikshit, and Christophe Lefevre Copyright © 2016 MuthuKrishnan Selvaraj et al. All rights reserved. In Silico Approach for SAR Analysis of the Predicted Model of DEPDC1B: A Novel Target for Oral Cancer Mon, 29 Feb 2016 14:19:51 +0000 With the incidence rate of oral carcinogenesis increasing in the Southeast-Asian countries, due to increase in the consumption of tobacco and betel quid as well as infection from human papillomavirus, specifically type 16, it becomes crucial to predict the transition of premalignant lesion to cancerous tissue at an initial stage in order to control the process of oncogenesis. DEPDC1B, downregulated in the presence of E2 protein, was recently found to be overexpressed in oral cancer, which can possibly be explained by the disruption of the E2 open reading frame upon the integration of viral genome into the host genome. DEPDC1B mediates its effect by directly interacting with Rac1 protein, which is known to regulate important cell signaling pathways. Therefore, DEPDC1B can be a potential biomarker as well as a therapeutic target for diagnosing and curing the disease. However, the lack of 3D model of the structure makes the utilization of DEPDC1B as a therapeutic target difficult. The present study focuses on the prediction of a suitable 3D model of the protein as well as the analysis of protein-protein interaction between DEPDC1B and Rac1 protein using PatchDock web server along with the identification of allosteric or regulatory sites of DEPDC1B. Palak Ahuja and Kailash Singh Copyright © 2016 Palak Ahuja and Kailash Singh. All rights reserved. Large-Scale Recurrent Neural Network Based Modelling of Gene Regulatory Network Using Cuckoo Search-Flower Pollination Algorithm Tue, 16 Feb 2016 12:51:16 +0000 The accurate prediction of genetic networks using computational tools is one of the greatest challenges in the postgenomic era. Recurrent Neural Network is one of the most popular but simple approaches to model the network dynamics from time-series microarray data. To date, it has been successfully applied to computationally derive small-scale artificial and real-world genetic networks with high accuracy. However, they underperformed for large-scale genetic networks. Here, a new methodology has been proposed where a hybrid Cuckoo Search-Flower Pollination Algorithm has been implemented with Recurrent Neural Network. Cuckoo Search is used to search the best combination of regulators. Moreover, Flower Pollination Algorithm is applied to optimize the model parameters of the Recurrent Neural Network formalism. Initially, the proposed method is tested on a benchmark large-scale artificial network for both noiseless and noisy data. The results obtained show that the proposed methodology is capable of increasing the inference of correct regulations and decreasing false regulations to a high degree. Secondly, the proposed methodology has been validated against the real-world dataset of the DNA SOS repair network of Escherichia coli. However, the proposed method sacrifices computational time complexity in both cases due to the hybrid optimization process. Sudip Mandal, Abhinandan Khan, Goutam Saha, and Rajat K. Pal Copyright © 2016 Sudip Mandal et al. All rights reserved. Inhibition of Mycobacterium-RmlA by Molecular Modeling, Dynamics Simulation, and Docking Sun, 14 Feb 2016 09:30:49 +0000 The increasing resistance to anti-tb drugs has enforced strategies for finding new drug targets against Mycobacterium tuberculosis (Mtb). In recent years enzymes associated with the rhamnose pathway in Mtb have attracted attention as drug targets. The present work is on α-D-glucose-1-phosphate thymidylyltransferase (RmlA), the first enzyme involved in the biosynthesis of L-rhamnose, of Mtb cell wall. This study aims to derive a 3D structure of RmlA by using a comparative modeling approach. Structural refinement and energy minimization of the built model have been done with molecular dynamics. The reliability assessment of the built model was carried out with various protein checking tools such as Procheck, Whatif, ProsA, Errat, and Verify 3D. The obtained model investigates the relation between the structure and function. Molecular docking interactions of Mtb-RmlA with modified EMB (ethambutol) ligands and natural substrate have revealed specific key residues Arg13, Lys23, Asn109, and Thr223 which play an important role in ligand binding and selection. Compared to all EMB ligands, EMB-1 has shown better interaction with Mtb-RmlA model. The information thus discussed above will be useful for the rational design of safe and effective inhibitors specific to RmlA enzyme pertaining to the treatment of tuberculosis. N. Harathi, Madhusudana Pulaganti, C. M. Anuradha, and Suresh Kumar Chitta Copyright © 2016 N. Harathi et al. All rights reserved. In Silico Phylogenetic Analysis and Molecular Modelling Study of 2-Haloalkanoic Acid Dehalogenase Enzymes from Bacterial and Fungal Origin Wed, 06 Jan 2016 11:58:08 +0000 2-Haloalkanoic acid dehalogenase enzymes have broad range of applications, starting from bioremediation to chemical synthesis of useful compounds that are widely distributed in fungi and bacteria. In the present study, a total of 81 full-length protein sequences of 2-haloalkanoic acid dehalogenase from bacteria and fungi were retrieved from NCBI database. Sequence analysis such as multiple sequence alignment (MSA), conserved motif identification, computation of amino acid composition, and phylogenetic tree construction were performed on these primary sequences. From MSA analysis, it was observed that the sequences share conserved lysine (K) and aspartate (D) residues in them. Also, phylogenetic tree indicated a subcluster comprised of both fungal and bacterial species. Due to nonavailability of experimental 3D structure for fungal 2-haloalkanoic acid dehalogenase in the PDB, molecular modelling study was performed for both fungal and bacterial sources of enzymes present in the subcluster. Further structural analysis revealed a common evolutionary topology shared between both fungal and bacterial enzymes. Studies on the buried amino acids showed highly conserved Leu and Ser in the core, despite variation in their amino acid percentage. Additionally, a surface exposed tryptophan was conserved in all of these selected models. Raghunath Satpathy, V. B. Konkimalla, and Jagnyeswar Ratha Copyright © 2016 Raghunath Satpathy et al. All rights reserved. FN-Identify: Novel Restriction Enzymes-Based Method for Bacterial Identification in Absence of Genome Sequencing Thu, 31 Dec 2015 06:00:15 +0000 Sequencing and restriction analysis of genes like 16S rRNA and HSP60 are intensively used for molecular identification in the microbial communities. With aid of the rapid progress in bioinformatics, genome sequencing became the method of choice for bacterial identification. However, the genome sequencing technology is still out of reach in the developing countries. In this paper, we propose FN-Identify, a sequencing-free method for bacterial identification. FN-Identify exploits the gene sequences data available in GenBank and other databases and the two algorithms that we developed, CreateScheme and GeneIdentify, to create a restriction enzyme-based identification scheme. FN-Identify was tested using three different and diverse bacterial populations (members of Lactobacillus, Pseudomonas, and Mycobacterium groups) in an in silico analysis using restriction enzymes and sequences of 16S rRNA gene. The analysis of the restriction maps of the members of three groups using the fragment numbers information only or along with fragments sizes successfully identified all of the members of the three groups using a minimum of four and maximum of eight restriction enzymes. Our results demonstrate the utility and accuracy of FN-Identify method and its two algorithms as an alternative method that uses the standard microbiology laboratories techniques when the genome sequencing is not available. Mohamed Awad, Osama Ouda, Ali El-Refy, Fawzy A. El-Feky, Kareem A. Mosa, and Mohamed Helmy Copyright © 2015 Mohamed Awad et al. All rights reserved. Local Mutational Pressures in Genomes of Zaire Ebolavirus and Marburg Virus Sun, 20 Dec 2015 12:55:49 +0000 Heterogeneities in nucleotide content distribution along the length of Zaire ebolavirus and Marburg virus genomes have been analyzed. Results showed that there is asymmetric mutational A-pressure in the majority of Zaire ebolavirus genes; there is mutational AC-pressure in the coding region of the matrix protein VP40, probably, caused by its high expression at the end of the infection process; there is also AC-pressure in the 3′-part of the nucleoprotein (NP) coding gene associated with low amount of secondary structure formed by the 3′-part of its mRNA; in the middle of the glycoprotein (GP) coding gene that kind of mutational bias is linked with the high amount of secondary structure formed by the corresponding fragment of RNA negative (−) strand; there is relatively symmetric mutational AU-pressure in the polymerase (Pol) coding gene caused by its low expression level. In Marburg virus all genes, including C-rich fragment of GP coding region, demonstrate asymmetric mutational A-bias, while the last gene (Pol) demonstrates more symmetric mutational AU-pressure. The hypothesis of a newly synthesized RNA negative (−) strand shielding by complementary fragments of mRNAs has been described in this work: shielded fragments of RNA negative (−) strand should be better protected from oxidative damage and prone to ADAR-editing. Vladislav Victorovich Khrustalev, Eugene Victorovich Barkovsky, and Tatyana Aleksandrovna Khrustaleva Copyright © 2015 Vladislav Victorovich Khrustalev et al. All rights reserved. HBS-Tools for Hairpin Bisulfite Sequencing Data Processing and Analysis Sun, 20 Dec 2015 08:55:29 +0000 The emerging genome-wide hairpin bisulfite sequencing (hairpin-BS-Seq) technique enables the determination of the methylation pattern for DNA double strands simultaneously. Compared with traditional bisulfite sequencing (BS-Seq) techniques, hairpin-BS-Seq can determine methylation fidelity and increase mapping efficiency. However, no computational tool has been designed for the analysis of hairpin-BS-Seq data yet. Here we present HBS-tools, a set of command line based tools for the preprocessing, mapping, methylation calling, and summarizing of genome-wide hairpin-BS-Seq data. It accepts paired-end hairpin-BS-Seq reads to recover the original (pre-bisulfite-converted) sequences using global alignment and then calls the methylation statuses for cytosines on both DNA strands after mapping the original sequences to the reference genome. After applying to hairpin-BS-Seq datasets, we found that HBS-tools have a reduced mapping time and improved mapping efficiency compared with state-of-the-art mapping tools. The HBS-tools source scripts, along with user guide and testing data, are freely available for download. Ming-an Sun, Karthik Raja Velmurugan, David Keimig, and Hehuang Xie Copyright © 2015 Ming-an Sun et al. All rights reserved. Developing of the Computer Method for Annotation of Bacterial Genes Sun, 06 Dec 2015 13:58:51 +0000 Over the last years a great number of bacterial genomes were sequenced. Now one of the most important challenges of computational genomics is the functional annotation of nucleic acid sequences. In this study we presented the computational method and the annotation system for predicting biological functions using phylogenetic profiles. The phylogenetic profile of a gene was created by way of searching for similarities between the nucleotide sequence of the gene and 1204 reference genomes, with further estimation of the statistical significance of found similarities. The profiles of the genes with known functions were used for prediction of possible functions and functional groups for the new genes. We conducted the functional annotation for genes from 104 bacterial genomes and compared the functions predicted by our system with the already known functions. For the genes that have already been annotated, the known function matched the function we predicted in 63% of the time, and in 86% of the time the known function was found within the top five predicted functions. Besides, our system increased the share of annotated genes by 19%. The developed system may be used as an alternative or complementary system to the current annotation systems. Mikhail A. Golyshev and Eugene V. Korotkov Copyright © 2015 Mikhail A. Golyshev and Eugene V. Korotkov. All rights reserved. Evaluation of Docking Target Functions by the Comprehensive Investigation of Protein-Ligand Energy Minima Thu, 26 Nov 2015 12:56:29 +0000 The adequate choice of the docking target function impacts the accuracy of the ligand positioning as well as the accuracy of the protein-ligand binding energy calculation. To evaluate a docking target function we compared positions of its minima with the experimentally known pose of the ligand in the protein active site. We evaluated five docking target functions based on either the MMFF94 force field or the PM7 quantum-chemical method with or without implicit solvent models: PCM, COSMO, and SGB. Each function was tested on the same set of 16 protein-ligand complexes. For exhaustive low-energy minima search the novel MPI parallelized docking program FLM and large supercomputer resources were used. Protein-ligand binding energies calculated using low-energy minima were compared with experimental values. It was demonstrated that the docking target function on the base of the MMFF94 force field in vacuo can be used for discovery of native or near native ligand positions by finding the low-energy local minima spectrum of the target function. The importance of solute-solvent interaction for the correct ligand positioning is demonstrated. It is shown that docking accuracy can be improved by replacement of the MMFF94 force field by the new semiempirical quantum-chemical PM7 method. Igor V. Oferkin, Ekaterina V. Katkova, Alexey V. Sulimov, Danil C. Kutov, Sergey I. Sobolev, Vladimir V. Voevodin, and Vladimir B. Sulimov Copyright © 2015 Igor V. Oferkin et al. All rights reserved. In Silico Investigation of Flavonoids as Potential Trypanosomal Nucleoside Hydrolase Inhibitors Thu, 12 Nov 2015 06:37:53 +0000 Human African Trypanosomiasis is endemic to 37 countries of sub-Saharan Africa. It is caused by two related species of Trypanosoma brucei. Current therapies suffer from resistance and public accessibility of expensive medicines. Finding safer and effective therapies of natural origin is being extensively explored worldwide. Pentamidine is the only available therapy for inhibiting the P2 adenosine transporter involved in the purine salvage pathway of the trypanosomatids. The objective of the present study is to use computational studies for the investigation of the probable trypanocidal mechanism of flavonoids. Docking experiments were carried out on eight flavonoids of varying level of hydroxylation, namely, flavone, 5-hydroxyflavone, 7-hydroxyflavone, chrysin, apigenin, kaempferol, fisetin, and quercetin. Using AutoDock 4.2, these compounds were tested for their affinity towards inosine-adenosine-guanosine nucleoside hydrolase and the inosine-guanosine nucleoside hydrolase, the major enzymes of the purine salvage pathway. Our results showed that all of the eight tested flavonoids showed high affinities for both hydrolases (lowest free binding energy ranging from −10.23 to −7.14 kcal/mol). These compounds, especially the hydroxylated derivatives, could be further studied as potential inhibitors of the nucleoside hydrolases. Christina Hung Hung Ha, Ayesha Fatima, and Anand Gaurav Copyright © 2015 Christina Hung Hung Ha et al. All rights reserved. High-Throughput Quantification of Phenotype Heterogeneity Using Statistical Features Tue, 20 Oct 2015 11:56:39 +0000 Statistical features are widely used in radiology for tumor heterogeneity assessment using magnetic resonance (MR) imaging technique. In this paper, feature selection based on decision tree is examined to determine the relevant subset of glioblastoma (GBM) phenotypes in the statistical domain. To discriminate between active tumor (vAT) and edema/invasion (vE) phenotype, we selected the significant features using analysis of variance (ANOVA) with p value < 0.01. Then, we implemented the decision tree to define the optimal subset features of phenotype classifier. Naïve Bayes (NB), support vector machine (SVM), and decision tree (DT) classifier were considered to evaluate the performance of the feature based scheme in terms of its capability to discriminate vAT from vE. Whole nine features were statistically significant to classify the vAT from vE with p value < 0.01. Feature selection based on decision tree showed the best performance by the comparative study using full feature set. The feature selected showed that the two features Kurtosis and Skewness achieved a highest range value of 58.33–75.00% accuracy classifier and 73.88–92.50% AUC. This study demonstrated the ability of statistical features to provide a quantitative, individualized measurement of glioblastoma patient and assess the phenotype progression. Ahmad Chaddad and Camel Tanougast Copyright © 2015 Ahmad Chaddad and Camel Tanougast. All rights reserved. Identification of Novel Inhibitors for Tobacco Mosaic Virus Infection in Solanaceae Plants Sun, 18 Oct 2015 14:29:45 +0000 Tobacco mosaic virus (TMV) infects several crops of economic importance (e.g., tomato) and remains as one of the major concerns to the farmers. TMV enters the host cell and produces the capping enzyme RNA polymerase. The viral genome replicates further to produce multiple mRNAs which encodes several proteins, including the coat protein and an RNA-dependent RNA polymerase (RdRp), as well as the movement protein. TMV replicase domain was chosen for the virtual screening studies against small molecules derived from ligand databases such as PubChem and ChemBank. Catalytic sites of the RdRp domain were identified and subjected to docking analysis with screened ligands derived from virtual screening LigandFit. Small molecules that interact with the target molecule at the catalytic domain region amino acids, GDD, were chosen as the best inhibitors for controlling the TMV replicase activity. Archana Prabahar, Subashini Swaminathan, Arul Loganathan, and Ramalingam Jegadeesan Copyright © 2015 Archana Prabahar et al. All rights reserved. Discovering Alzheimer Genetic Biomarkers Using Bayesian Networks Sun, 23 Aug 2015 09:30:44 +0000 Single nucleotide polymorphisms (SNPs) contribute most of the genetic variation to the human genome. SNPs associate with many complex and common diseases like Alzheimer’s disease (AD). Discovering SNP biomarkers at different loci can improve early diagnosis and treatment of these diseases. Bayesian network provides a comprehensible and modular framework for representing interactions between genes or single SNPs. Here, different Bayesian network structure learning algorithms have been applied in whole genome sequencing (WGS) data for detecting the causal AD SNPs and gene-SNP interactions. We focused on polymorphisms in the top ten genes associated with AD and identified by genome-wide association (GWA) studies. New SNP biomarkers were observed to be significantly associated with Alzheimer’s disease. These SNPs are rs7530069, rs113464261, rs114506298, rs73504429, rs7929589, rs76306710, and rs668134. The obtained results demonstrated the effectiveness of using BN for identifying AD causal SNPs with acceptable accuracy. The results guarantee that the SNP set detected by Markov blanket based methods has a strong association with AD disease and achieves better performance than both naïve Bayes and tree augmented naïve Bayes. Minimal augmented Markov blanket reaches accuracy of 66.13% and sensitivity of 88.87% versus 61.58% and 59.43% in naïve Bayes, respectively. Fayroz F. Sherif, Nourhan Zayed, and Mahmoud Fakhr Copyright © 2015 Fayroz F. Sherif et al. All rights reserved. Tyrosine Kinase Ligand-Receptor Pair Prediction by Using Support Vector Machine Tue, 11 Aug 2015 11:22:47 +0000 Receptor tyrosine kinases are essential proteins involved in cellular differentiation and proliferation in vivo and are heavily involved in allergic diseases, diabetes, and onset/proliferation of cancerous cells. Identifying the interacting partner of this protein, a growth factor ligand, will provide a deeper understanding of cellular proliferation/differentiation and other cell processes. In this study, we developed a method for predicting tyrosine kinase ligand-receptor pairs from their amino acid sequences. We collected tyrosine kinase ligand-receptor pairs from the Database of Interacting Proteins (DIP) and UniProtKB, filtered them by removing sequence redundancy, and used them as a dataset for machine learning and assessment of predictive performance. Our prediction method is based on support vector machines (SVMs), and we evaluated several input features suitable for tyrosine kinase for machine learning and compared and analyzed the results. Using sequence pattern information and domain information extracted from sequences as input features, we obtained 0.996 of the area under the receiver operating characteristic curve. This accuracy is higher than that obtained from general protein-protein interaction pair predictions. Masayuki Yarimizu, Cao Wei, Yusuke Komiyama, Kokoro Ueki, Shugo Nakamura, Kazuya Sumikoshi, Tohru Terada, and Kentaro Shimizu Copyright © 2015 Masayuki Yarimizu et al. All rights reserved. A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data Thu, 11 Jun 2015 13:48:13 +0000 We summarise various ways of performing dimensionality reduction on high-dimensional microarray data. Many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. A popular source of data is microarrays, a biological platform for gathering gene expressions. Analysing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. We present some of the most popular methods for selecting significant features and provide a comparison between them. Their advantages and disadvantages are outlined in order to provide a clearer idea of when to use each one of them for saving computational time and resources. Zena M. Hira and Duncan F. Gillies Copyright © 2015 Zena M. Hira and Duncan F. Gillies. All rights reserved. Semantic Annotation for Biological Information Retrieval System Mon, 09 Feb 2015 06:37:38 +0000 Online literatures are increasing in a tremendous rate. Biological domain is one of the fast growing domains. Biological researchers face a problem finding what they are searching for effectively and efficiently. The aim of this research is to find documents that contain any combination of biological process and/or molecular function and/or cellular component. This research proposes a framework that helps researchers to retrieve meaningful documents related to their asserted terms based on gene ontology (GO). The system utilizes GO by semantically decomposing it into three subontologies (cellular component, biological process, and molecular function). Researcher has the flexibility to choose searching terms from any combination of the three subontologies. Document annotation is taking a place in this research to create an index of biological terms in documents to speed the searching process. Query expansion is used to infer semantically related terms to asserted terms. It increases the search meaningful results using the term synonyms and term relationships. The system uses a ranking method to order the retrieved documents based on the ranking weights. The proposed system achieves researchers’ needs to find documents that fit the asserted terms semantically. Mohamed Marouf Z. Oshaiba, Enas M. F. El Houby, and Akram Salah Copyright © 2015 Mohamed Marouf Z. Oshaiba et al. All rights reserved. A Highly Conserved GEQYQQLR Epitope Has Been Identified in the Nucleoprotein of Ebola Virus by Using an In Silico Approach Sun, 01 Feb 2015 09:50:26 +0000 Ebola virus (EBOV) is a deadly virus that has caused several fatal outbreaks. Recently it caused another outbreak and resulted in thousands afflicted cases. Effective and approved vaccine or therapeutic treatment against this virus is still absent. In this study, we aimed to predict B-cell epitopes from several EBOV encoded proteins which may aid in developing new antibody-based therapeutics or viral antigen detection method against this virus. Multiple sequence alignment (MSA) was performed for the identification of conserved region among glycoprotein (GP), nucleoprotein (NP), and viral structural proteins (VP40, VP35, and VP24) of EBOV. Next, different consensus immunogenic and conserved sites were predicted from the conserved region(s) using various computational tools which are available in Immune Epitope Database (IEDB). Among GP, NP, VP40, VP35, and VP30 protein, only NP gave a 100% conserved GEQYQQLR B-cell epitope that fulfills the ideal features of an effective B-cell epitope and could lead a way in the milieu of Ebola treatment. However, successful in vivo and in vitro studies are prerequisite to determine the actual potency of our predicted epitope and establishing it as a preventing medication against all the fatal strains of EBOV. Mohammad Tuhin Ali and Md Ohedul Islam Copyright © 2015 Mohammad Tuhin Ali and Md Ohedul Islam. All rights reserved. Development of a Machine Learning Method to Predict Membrane Protein-Ligand Binding Residues Using Basic Sequence Information Sat, 31 Jan 2015 08:20:58 +0000 Locating ligand binding sites and finding the functionally important residues from protein sequences as well as structures became one of the challenges in understanding their function. Hence a Naïve Bayes classifier has been trained to predict whether a given amino acid residue in membrane protein sequence is a ligand binding residue or not using only sequence based information. The input to the classifier consists of the features of the target residue and two sequence neighbors on each side of the target residue. The classifier is trained and evaluated on a nonredundant set of 42 sequences (chains with at least one transmembrane domain) from 31 alpha-helical membrane proteins. The classifier achieves an overall accuracy of 70.7% with 72.5% specificity and 61.1% sensitivity in identifying ligand binding residues from sequence. The classifier performs better when the sequence is encoded by psi-blast generated PSSM profiles. Assessment of the predictions in the context of three-dimensional structures of proteins reveals the effectiveness of this method in identifying ligand binding sites from sequence information. In 83.3% (35 out of 42) of the proteins, the classifier identifies the ligand binding sites by correctly recognizing more than half of the binding residues. This will be useful to protein engineers in exploiting potential residues for functional assessment. M. Xavier Suresh, M. Michael Gromiha, and Makiko Suwa Copyright © 2015 M. Xavier Suresh et al. All rights reserved. PhosphoHunter: An Efficient Software Tool for Phosphopeptide Identification Mon, 12 Jan 2015 12:58:05 +0000 Phosphorylation is a protein posttranslational modification. It is responsible of the activation/inactivation of disease-related pathways, thanks to its role of “molecular switch.” The study of phosphorylated proteins becomes a key point for the proteomic analyses focused on the identification of diagnostic/therapeutic targets. Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the most widely used analytical approach. Although unmodified peptides are automatically identified by consolidated algorithms, phosphopeptides still require automated tools to avoid time-consuming manual interpretation. To improve phosphopeptide identification efficiency, a novel procedure was developed and implemented in a Perl/C tool called PhosphoHunter, here proposed and evaluated. It includes a preliminary heuristic step for filtering out the MS/MS spectra produced by nonphosphorylated peptides before sequence identification. A method to assess the statistical significance of identified phosphopeptides was also formulated. PhosphoHunter performance was tested on a dataset of 1500 MS/MS spectra and it was compared with two other tools: Mascot and Inspect. Comparisons demonstrated that a strong point of PhosphoHunter is sensitivity, suggesting that it is able to identify real phosphopeptides with superior performance. Performance indexes depend on a single parameter (intensity threshold) that users can tune according to the study aim. All the three tools localized >90% of phosphosites. Alessandra Tiengo, Lorenzo Pasotti, Nicola Barbarini, and Paolo Magni Copyright © 2015 Alessandra Tiengo et al. All rights reserved.