Advances in Bioinformatics The latest articles from Hindawi Publishing Corporation © 2014 , Hindawi Publishing Corporation . All rights reserved. Artificial Neural Network Application in the Diagnosis of Disease Conditions with Liver Ultrasound Images Tue, 16 Sep 2014 00:00:00 +0000 The preliminary study presented within this paper shows a comparative study of various texture features extracted from liver ultrasonic images by employing Multilayer Perceptron (MLP), a type of artificial neural network, to study the presence of disease conditions. An ultrasound (US) image shows echo-texture patterns, which defines the organ characteristics. Ultrasound images of liver disease conditions such as “fatty liver,” “cirrhosis,” and “hepatomegaly” produce distinctive echo patterns. However, various ultrasound imaging artifacts and speckle noise make these echo-texture patterns difficult to identify and often hard to distinguish visually. Here, based on the extracted features from the ultrasonic images, we employed an artificial neural network for the diagnosis of disease conditions in liver and finding of the best classifier that distinguishes between abnormal and normal conditions of the liver. Comparison of the overall performance of all the feature classifiers concluded that “mixed feature set” is the best feature set. It showed an excellent rate of accuracy for the training data set. The gray level run length matrix (GLRLM) feature shows better results when the network was tested against unknown data. Karthik Kalyan, Binal Jakhia, Ramachandra Dattatraya Lele, Mukund Joshi, and Abhay Chowdhary Copyright © 2014 Karthik Kalyan et al. All rights reserved. Breast Cancer Nodes Detection Using Ultrasonic Microscale Subarrayed MIMO RADAR Mon, 15 Sep 2014 08:37:25 +0000 This paper proposes the use of ultrasonic microscale subarrayed MIMO RADARs to estimate the position of breast cancer nodes. The transmit and receive antenna arrays are divided into subarrays. In order to increase the signal diversity each subarray is assigned a different waveform from an orthogonal set. High-frequency ultrasonic transducers are used since a breast is considered to be a superficial structure. Closed form expressions for the optimal Neyman-Pearson detector are derived. The combination of the waveform diversity present in the subarrayed deployment and traditional phased-array RADAR techniques provides promising results. Attaphongse Taparugssanagorn, Siwaruk Siwamogsatham, and Carlos Pomalaza-Ráez Copyright © 2014 Attaphongse Taparugssanagorn et al. All rights reserved. Utilization of Boron Compounds for the Modification of Suberoyl Anilide Hydroxamic Acid as Inhibitor of Histone Deacetylase Class II Homo sapiens Sun, 24 Aug 2014 10:43:10 +0000 Histone deacetylase (HDAC) has a critical function in regulating gene expression. The inhibition of HDAC has developed as an interesting anticancer research area that targets biological processes such as cell cycle, apoptosis, and cell differentiation. In this study, an HDAC inhibitor that is available commercially, suberoyl anilide hydroxamic acid (SAHA), has been modified to improve its efficacy and reduce the side effects of the compound. Hydrophobic cap and zinc-binding group of these compounds were substituted with boron-based compounds, whereas the linker region was substituted with p-aminobenzoic acid. The molecular docking analysis resulted in 8 ligands with Δ value more negative than the standards, SAHA and trichostatin A (TSA). That ligands were analyzed based on the nature of QSAR, pharmacological properties, and ADME-Tox. It is conducted to obtain a potent inhibitor of HDAC class II Homo sapiens. The screening process result gave one best ligand, Nova2 (513246-99-6), which was then further studied by molecular dynamics simulations. Ridla Bakri, Arli Aditya Parikesit, Cipta Prio Satriyanto, Djati Kerami, and Usman Sumo Friend Tambunan Copyright © 2014 Ridla Bakri et al. All rights reserved. AUTO-MUTE 2.0: A Portable Framework with Enhanced Capabilities for Predicting Protein Functional Consequences upon Mutation Sun, 17 Aug 2014 08:33:10 +0000 The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models. Three of the predictors evaluate changes to protein stability upon mutation, each complementing a distinct experimental approach. Two additional classifiers are available, one for predicting activity changes due to residue replacements and the other for determining the disease potential of mutations associated with nonsynonymous single nucleotide polymorphisms (nsSNPs) in human proteins. These five command-line driven tools, as well as all the supporting programs, complement those that run our AUTO-MUTE web-based server. Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback. Included among these upgrades is the ability to perform three highly requested tasks: to run “big data” batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models. Majid Masso and Iosif I. Vaisman Copyright © 2014 Majid Masso and Iosif I. Vaisman. All rights reserved. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes Sun, 03 Aug 2014 10:37:23 +0000 Background. Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results. A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Each group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions. This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family. Shea N. Gardner, Crystal J. Jaing, Maher M. Elsheikh, José Peña, David A. Hysom, and Monica K. Borucki Copyright © 2014 Shea N. Gardner et al. All rights reserved. Prediction of Epitope-Based Peptides for the Utility of Vaccine Development from Fusion and Glycoprotein of Nipah Virus Using In Silico Approach Thu, 24 Jul 2014 07:20:33 +0000 This study aims to design epitope-based peptides for the utility of vaccine development by targeting glycoprotein G and envelope protein F of Nipah virus (NiV) that, respectively, facilitate attachment and fusion of NiV with host cells. Using various databases and tools, immune parameters of conserved sequence(s) from G and F proteins of different isolates of NiV were tested to predict probable epitope(s). Binding analyses of the peptides with MHC class-I and class-II molecules, epitope conservancy, population coverage, and linear B cell epitope prediction were analyzed. Predicted peptides interacted with seven or more MHC alleles and illustrated population coverage of more than 99% and 95%, for G and F proteins, respectively. The predicted class-I nonamers, SLIDTSSTI and EWISIVPNF, superimposed on the putative decameric B cell epitopes, were also identified as core sequences of the most probable class-II 15-mer peptides GPKVSLIDTSSTITI and EWISIVPNFILVRNT. These peptides were further validated for their binding to specific HLA alleles using in silico docking technique. Our in silico analysis suggested that the predicted epitopes, either GPKVSLIDTSSTITI or EWISIVPNFILVRNT, could be a better choice as universal vaccine component against NiV irrespective of different isolates which may elicit both humoral and cell-mediated immunity. M. Sadman Sakib, Md. Rezaul Islam, A. K. M. Mahbub Hasan, and A. H. M. Nurun Nabi Copyright © 2014 M. Sadman Sakib et al. All rights reserved. IN-MACA-MCC: Integrated Multiple Attractor Cellular Automata with Modified Clonal Classifier for Human Protein Coding and Promoter Prediction Tue, 15 Jul 2014 09:40:11 +0000 Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000). The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata) and MCC (modified clonal classifier) to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992) datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006) dataset and nonpromoters from EID (Saxonov et al., 2000) and UTRdb (Pesole et al., 2002) datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively. Kiran Sree Pokkuluri, Ramesh Babu Inampudi, and S. S. S. N. Usha Devi Nedunuri Copyright © 2014 Kiran Sree Pokkuluri et al. All rights reserved. Pharmacophore Modeling and Molecular Docking Studies on Pinus roxburghii as a Target for Diabetes Mellitus Thu, 10 Jul 2014 10:07:59 +0000 The present study attempts to establish a relationship between ethnopharmacological claims and bioactive constituents present in Pinus roxburghii against all possible targets for diabetes through molecular docking and to develop a pharmacophore model for the active target. The process of molecular docking involves study of different bonding modes of one ligand with active cavities of target receptors protein tyrosine phosphatase 1-beta (PTP-1β), dipeptidyl peptidase-IV (DPP-IV), aldose reductase (AR), and insulin receptor (IR) with help of docking software Molegro virtual docker (MVD). From the results of docking score values on different receptors for antidiabetic activity, it is observed that constituents, namely, secoisoresinol, pinoresinol, and cedeodarin, showed the best docking results on almost all the receptors, while the most significant results were observed on AR. Then, LigandScout was applied to develop a pharmacophore model for active target. LigandScout revealed that 2 hydrogen bond donors pointing towards Tyr 48 and His 110 are a major requirement of the pharmacophore generated. In our molecular docking studies, the active constituent, secoisoresinol, has also shown hydrogen bonding with His 110 residue which is a part of the pharmacophore. The docking results have given better insights into the development of better aldose reductase inhibitor so as to treat diabetes related secondary complications. Pawan Kaushik, Sukhbir Lal Khokra, A. C. Rana, and Dhirender Kaushik Copyright © 2014 Pawan Kaushik et al. All rights reserved. How Good Are Simplified Models for Protein Structure Prediction? Tue, 29 Apr 2014 07:15:44 +0000 Protein structure prediction (PSP) has been one of the most challenging problems in computational biology for several decades. The challenge is largely due to the complexity of the all-atomic details and the unknown nature of the energy function. Researchers have therefore used simplified energy models that consider interaction potentials only between the amino acid monomers in contact on discrete lattices. The restricted nature of the lattices and the energy models poses a twofold concern regarding the assessment of the models. Can a native or a very close structure be obtained when structures are mapped to lattices? Can the contact based energy models on discrete lattices guide the search towards the native structures? In this paper, we use the protein chain lattice fitting (PCLF) problem to address the first concern; we developed a constraint-based local search algorithm for the PCLF problem for cubic and face-centered cubic lattices and found very close lattice fits for the native structures. For the second concern, we use a number of techniques to sample the conformation space and find correlations between energy functions and root mean square deviation (RMSD) distance of the lattice-based structures with the native structures. Our analysis reveals weakness of several contact based energy models used that are popular in PSP. Swakkhar Shatabda, M. A. Hakim Newton, Mahmood A. Rashid, Duc Nghia Pham, and Abdul Sattar Copyright © 2014 Swakkhar Shatabda et al. All rights reserved. Elementary Flux Mode Analysis of Acetyl-CoA Pathway in Carboxydothermus hydrogenoformans Z-2901 Wed, 16 Apr 2014 08:04:40 +0000 Carboxydothermus hydrogenoformans is a carboxydotrophic hydrogenogenic bacterium species that produces hydrogen molecule by utilizing carbon monoxide (CO) or pyruvate as a carbon source. To investigate the underlying biochemical mechanism of hydrogen production, an elementary mode analysis of acetyl-CoA pathway was performed to determine the intermediate fluxes by combining linear programming (LP) method available in CellNetAnalyzer software. We hypothesized that addition of enzymes necessary for carbon monoxide fixation and pyruvate dissimilation would enhance the theoretical yield of hydrogen. An in silico gene knockout of pyk, pykC, and mdh genes of modeled acetyl-CoA pathway allows the maximum theoretical hydrogen yield of 47.62 mmol/gCDW/h for 1 mole of carbon monoxide (CO) uptake. The obtained hydrogen yield is comparatively two times greater than the previous experimental data. Therefore, it could be concluded that this elementary flux mode analysis is a crucial way to achieve efficient hydrogen production through acetyl-CoA pathway and act as a model for strain improvement. Rajadurai Chinnasamy Perumal, Ashok Selvaraj, and Gopal Ramesh Kumar Copyright © 2014 Rajadurai Chinnasamy Perumal et al. All rights reserved. Objective and Comprehensive Evaluation of Bisulfite Short Read Mapping Tools Tue, 15 Apr 2014 16:28:46 +0000 Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data. Hong Tran, Jacob Porter, Ming-an Sun, Hehuang Xie, and Liqing Zhang Copyright © 2014 Hong Tran et al. All rights reserved. Network Completion for Static Gene Expression Data Wed, 26 Mar 2014 11:30:15 +0000 We tackle the problem of completing and inferring genetic networks under stationary conditions from static data, where network completion is to make the minimum amount of modifications to an initial network so that the completed network is most consistent with the expression data in which addition of edges and deletion of edges are basic modification operations. For this problem, we present a new method for network completion using dynamic programming and least-squares fitting. This method can find an optimal solution in polynomial time if the maximum indegree of the network is bounded by a constant. We evaluate the effectiveness of our method through computational experiments using synthetic data. Furthermore, we demonstrate that our proposed method can distinguish the differences between two types of genetic networks under stationary conditions from lung cancer and normal gene expression data. Natsu Nakajima and Tatsuya Akutsu Copyright © 2014 Natsu Nakajima and Tatsuya Akutsu. All rights reserved. Secondary Structure Preferences of Mn2+ Binding Sites in Bacterial Proteins Mon, 17 Mar 2014 11:37:15 +0000 3D structures of proteins with coordinated Mn2+ ions from bacteria with low, average, and high genomic GC-content have been analyzed (149 PDB files were used). Major Mn2+ binders are aspartic acid (6.82% of Asp residues), histidine (14.76% of His residues), and glutamic acid (3.51% of Glu residues). We found out that the motif of secondary structure “beta strand-major binder-random coil” is overrepresented around all the three major Mn2+ binders. That motif may be followed by either alpha helix or beta strand. Beta strands near Mn2+ binding residues should be stable because they are enriched by such beta formers as valine and isoleucine, as well as by specific combinations of hydrophobic and hydrophilic amino acid residues characteristic to beta sheet. In the group of proteins from GC-rich bacteria glutamic acid residues situated in alpha helices frequently coordinate Mn2+ ions, probably, because of the decrease of Lys usage under the influence of mutational GC-pressure. On the other hand, the percentage of Mn2+ sites with at least one amino acid in the “beta strand-major binder-random coil” motif of secondary structure (77.88%) does not depend on genomic GC-content. Tatyana Aleksandrovna Khrustaleva Copyright © 2014 Tatyana Aleksandrovna Khrustaleva. All rights reserved. A Parallel Framework for Multipoint Spiral Search in ab Initio Protein Structure Prediction Sun, 16 Mar 2014 11:57:48 +0000 Protein structure prediction is computationally a very challenging problem. A large number of existing search algorithms attempt to solve the problem by exploring possible structures and finding the one with the minimum free energy. However, these algorithms perform poorly on large sized proteins due to an astronomically wide search space. In this paper, we present a multipoint spiral search framework that uses parallel processing techniques to expedite exploration by starting from different points. In our approach, a set of random initial solutions are generated and distributed to different threads. We allow each thread to run for a predefined period of time. The improved solutions are stored threadwise. When the threads finish, the solutions are merged together and the duplicates are removed. A selected distinct set of solutions are then split to different threads again. In our ab initio protein structure prediction method, we use the three-dimensional face-centred-cubic lattice for structure-backbone mapping. We use both the low resolution hydrophobic-polar energy model and the high-resolution energy model for search guiding. The experimental results show that our new parallel framework significantly improves the results obtained by the state-of-the-art single-point search approaches for both energy models on three-dimensional face-centred-cubic lattice. We also experimentally show the effectiveness of mixing energy models within parallel threads. Mahmood A. Rashid, Swakkhar Shatabda, M. A. Hakim Newton, Md Tamjidul Hoque, and Abdul Sattar Copyright © 2014 Mahmood A. Rashid et al. All rights reserved. A Brachytherapy Plan Evaluation Tool for Interstitial Applications Sun, 09 Feb 2014 05:54:49 +0000 Radiobiological metrics such as tumor control probability (TCP) and normal tissue complication probability (NTCP) help in assessing the quality of brachytherapy plans. Application of such metrics in clinics as well as research is still inadequate. This study presents the implementation of two indigenously designed plan evaluation modules: Brachy_TCP and Brachy_NTCP. Evaluation tools were constructed to compute TCP and NTCP from dose volume histograms (DVHs) of any interstitial brachytherapy treatment plan. The computation module was employed to estimate probabilities of tumor control and normal tissue complications in ten cervical cancer patients based on biologically effective equivalent uniform dose (BEEUD). The tumor control and normal tissue morbidity were assessed with clinical followup and were scored. The acute toxicity was graded using common terminology criteria for adverse events (CTCAE) version 4.0. Outcome score was found to be correlated with the TCP/NTCP estimates. Thus, the predictive ability of the estimates was quantified with the clinical outcomes. Biologically effective equivalent uniform dose-based formalism was found to be effective in predicting the complexities and disease control. Surega Anbumani, N. Arunai Nambiraj, Sridhar Dayalan, Kalaivany Ganesh, Pichandi Anchineyan, and Ramesh S. Bilimagga Copyright © 2014 Surega Anbumani et al. All rights reserved. Prediction of B-Cell Epitopes in Listeriolysin O, a Cholesterol Dependent Cytolysin Secreted by Listeria monocytogenes Thu, 02 Jan 2014 16:05:40 +0000 Listeria monocytogenes is a gram-positive, foodborne bacterium responsible for disease in humans and animals. Listeriolysin O (LLO) is a required virulence factor for the pathogenic effects of L. monocytogenes. Bioinformatics revealed conserved putative epitopes of LLO that could be used to develop monoclonal antibodies against LLO. Continuous and discontinuous epitopes were located by using four different B-cell prediction algorithms. Three-dimensional molecular models were generated to more precisely characterize the predicted antigenicity of LLO. Domain 4 was predicted to contain five of eleven continuous epitopes. A large portion of domain 4 was also predicted to comprise discontinuous immunogenic epitopes. Domain 4 of LLO may serve as an immunogen for eliciting monoclonal antibodies that can be used to study the pathogenesis of L. monocytogenes as well as develop an inexpensive assay. Morris S. Jones and J. Mark Carter Copyright © 2014 Morris S. Jones and J. Mark Carter. All rights reserved. Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets Wed, 09 Oct 2013 13:53:52 +0000 Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with is most accurate under the error measures considered. The k-nearest neighbor method with has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with has the best overall performance and k-nearest neighbor method with has the worst overall performance. These results hold true for both 5% and 10% missing values. Sreevidya Sadananda Sadasiva Rao, Lori A. Shepherd, Andrew E. Bruno, Song Liu, and Jeffrey C. Miecznikowski Copyright © 2013 Sreevidya Sadananda Sadasiva Rao et al. All rights reserved. A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks Tue, 25 Jun 2013 15:26:07 +0000 Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. Tomas Hruz, Markus Wyss, Christoph Lucas, Oliver Laule, Peter von Rohr, Philip Zimmermann, and Stefan Bleuler Copyright © 2013 Tomas Hruz et al. All rights reserved. Computational and Statistical Approaches for Modeling of Proteomic and Genomic Networks Thu, 16 May 2013 09:08:24 +0000 Mohamed Nounou, Hazem Nounou, Erchin Serpedin, Aniruddha Datta, and Yufei Huang Copyright © 2013 Mohamed Nounou et al. All rights reserved. Reverse Engineering Sparse Gene Regulatory Networks Using Cubature Kalman Filter and Compressed Sensing Wed, 08 May 2013 11:21:31 +0000 This paper proposes a novel algorithm for inferring gene regulatory networks which makes use of cubature Kalman filter (CKF) and Kalman filter (KF) techniques in conjunction with compressed sensing methods. The gene network is described using a state-space model. A nonlinear model for the evolution of gene expression is considered, while the gene expression data is assumed to follow a linear Gaussian model. The hidden states are estimated using CKF. The system parameters are modeled as a Gauss-Markov process and are estimated using compressed sensing-based KF. These parameters provide insight into the regulatory relations among the genes. The Cramér-Rao lower bound of the parameter estimates is calculated for the system model and used as a benchmark to assess the estimation accuracy. The proposed algorithm is evaluated rigorously using synthetic data in different scenarios which include different number of genes and varying number of sample points. In addition, the algorithm is tested on the DREAM4 in silico data sets as well as the in vivo data sets from IRMA network. The proposed algorithm shows superior performance in terms of accuracy, robustness, and scalability. Amina Noor, Erchin Serpedin, Mohamed Nounou, and Hazem Nounou Copyright © 2013 Amina Noor et al. All rights reserved. Efficient Serial and Parallel Algorithms for Selection of Unique Oligos in EST Databases Mon, 08 Apr 2013 17:06:36 +0000 Obtaining unique oligos from an EST database is a problem of great importance in bioinformatics, particularly in the discovery of new genes and the mapping of the human genome. Many algorithms have been developed to find unique oligos, many of which are much less time consuming than the traditional brute force approach. An algorithm was presented by Zheng et al. (2004) which finds the solution of the unique oligos search problem efficiently. We implement this algorithm as well as several new algorithms based on some theorems included in this paper. We demonstrate how, with these new algorithms, we can obtain unique oligos much faster than with previous ones. We parallelize these new algorithms to further improve the time of finding unique oligos. All algorithms are run on ESTs obtained from a Barley EST database. Manrique Mata-Montero, Nabil Shalaby, and Bradley Sheppard Copyright © 2013 Manrique Mata-Montero et al. All rights reserved. Correction of Spatial Bias in Oligonucleotide Array Data Wed, 13 Mar 2013 15:09:36 +0000 Background. Oligonucleotide microarrays allow for high-throughput gene expression profiling assays. The technology relies on the fundamental assumption that observed hybridization signal intensities (HSIs) for each intended target, on average, correlate with their target’s true concentration in the sample. However, systematic, nonbiological variation from several sources undermines this hypothesis. Background hybridization signal has been previously identified as one such important source, one manifestation of which appears in the form of spatial autocorrelation. Results. We propose an algorithm, pyn, for the elimination of spatial autocorrelation in HSIs, exploiting the duality of desirable mutual information shared by probes in a common probe set and undesirable mutual information shared by spatially proximate probes. We show that this correction procedure reduces spatial autocorrelation in HSIs; increases HSI reproducibility across replicate arrays; increases differentially expressed gene detection power; and performs better than previously published methods. Conclusions. The proposed algorithm increases both precision and accuracy, while requiring virtually no changes to users’ current analysis pipelines: the correction consists merely of a transformation of raw HSIs (e.g., CEL files for Affymetrix arrays). A free, open-source implementation is provided as an R package, compatible with standard Bioconductor tools. The approach may also be tailored to other platform types and other sources of bias. Philippe Serhal and Sébastien Lemieux Copyright © 2013 Philippe Serhal and Sébastien Lemieux. All rights reserved. Gene Regulation, Modulation, and Their Applications in Gene Expression Data Analysis Wed, 13 Mar 2013 11:03:18 +0000 Common microarray and next-generation sequencing data analysis concentrate on tumor subtype classification, marker detection, and transcriptional regulation discovery during biological processes by exploring the correlated gene expression patterns and their shared functions. Genetic regulatory network (GRN) based approaches have been employed in many large studies in order to scrutinize for dysregulation and potential treatment controls. In addition to gene regulation and network construction, the concept of the network modulator that has significant systemic impact has been proposed, and detection algorithms have been developed in past years. Here we provide a unified mathematic description of these methods, followed with a brief survey of these modulator identification algorithms. As an early attempt to extend the concept to new RNA regulation mechanism, competitive endogenous RNA (ceRNA), into a modulator framework, we provide two applications to illustrate the network construction, modulation effect, and the preliminary finding from these networks. Those methods we surveyed and developed are used to dissect the regulated network under different modulators. Not limit to these, the concept of “modulation” can adapt to various biological mechanisms to discover the novel gene regulation mechanisms. Mario Flores, Tzu-Hung Hsiao, Yu-Chiao Chiu, Eric Y. Chuang, Yufei Huang, and Yidong Chen Copyright © 2013 Mario Flores et al. All rights reserved. Spectral Analysis on Time-Course Expression Data: Detecting Periodic Genes Using a Real-Valued Iterative Adaptive Approach Thu, 28 Feb 2013 15:42:47 +0000 Time-course expression profiles and methods for spectrum analysis have been applied for detecting transcriptional periodicities, which are valuable patterns to unravel genes associated with cell cycle and circadian rhythm regulation. However, most of the proposed methods suffer from restrictions and large false positives to a certain extent. Additionally, in some experiments, arbitrarily irregular sampling times as well as the presence of high noise and small sample sizes make accurate detection a challenging task. A novel scheme for detecting periodicities in time-course expression data is proposed, in which a real-valued iterative adaptive approach (RIAA), originally proposed for signal processing, is applied for periodogram estimation. The inferred spectrum is then analyzed using Fisher’s hypothesis test. With a proper -value threshold, periodic genes can be detected. A periodic signal, two nonperiodic signals, and four sampling strategies were considered in the simulations, including both bursts and drops. In addition, two yeast real datasets were applied for validation. The simulations and real data analysis reveal that RIAA can perform competitively with the existing algorithms. The advantage of RIAA is manifested when the expression data are highly irregularly sampled, and when the number of cycles covered by the sampling time points is very reduced. Kwadwo S. Agyepong, Fang-Han Hsu, Edward R. Dougherty, and Erchin Serpedin Copyright © 2013 Kwadwo S. Agyepong et al. All rights reserved. Identification of Robust Pathway Markers for Cancer through Rank-Based Pathway Activity Inference Wed, 27 Feb 2013 09:47:10 +0000 One important problem in translational genomics is the identification of reliable and reproducible markers that can be used to discriminate between different classes of a complex disease, such as cancer. The typical small sample setting makes the prediction of such markers very challenging, and various approaches have been proposed to address this problem. For example, it has been shown that pathway markers, which aggregate the gene activities in the same pathway, tend to be more robust than gene markers. Furthermore, the use of gene expression ranking has been demonstrated to be robust to batch effects and that it can lead to more interpretable results. In this paper, we propose an enhanced pathway activity inference method that uses gene ranking to predict the pathway activity in a probabilistic manner. The main focus of this work is on identifying robust pathway markers that can ultimately lead to robust classifiers with reproducible performance across datasets. Simulation results based on multiple breast cancer datasets show that the proposed inference method identifies better pathway markers that can predict breast cancer metastasis with higher accuracy. Moreover, the identified pathway markers can lead to better classifiers with more consistent classification performance across independent datasets. Navadon Khunlertgit and Byung-Jun Yoon Copyright © 2013 Navadon Khunlertgit and Byung-Jun Yoon. All rights reserved. An Overview of the Statistical Methods Used for Inferring Gene Regulatory Networks and Protein-Protein Interaction Networks Thu, 21 Feb 2013 15:22:25 +0000 The large influx of data from high-throughput genomic and proteomic technologies has encouraged the researchers to seek approaches for understanding the structure of gene regulatory networks and proteomic networks. This work reviews some of the most important statistical methods used for modeling of gene regulatory networks (GRNs) and protein-protein interaction (PPI) networks. The paper focuses on the recent advances in the statistical graphical modeling techniques, state-space representation models, and information theoretic methods that were proposed for inferring the topology of GRNs. It appears that the problem of inferring the structure of PPI networks is quite different from that of GRNs. Clustering and probabilistic graphical modeling techniques are of prime importance in the statistical inference of PPI networks, and some of the recent approaches using these techniques are also reviewed in this paper. Performance evaluation criteria for the approaches used for modeling GRNs and PPI networks are also discussed. Amina Noor, Erchin Serpedin, Mohamed Nounou, Hazem Nounou, Nady Mohamed, and Lotfi Chouchane Copyright © 2013 Amina Noor et al. All rights reserved. Using Protein Clusters from Whole Proteomes to Construct and Augment a Dendrogram Wed, 20 Feb 2013 08:15:54 +0000 In this paper we present a new ab initio approach for constructing an unrooted dendrogram using protein clusters, an approach that has the potential for estimating relationships among several thousands of species based on their putative proteomes. We employ an open-source software program called pClust that was developed for use in metagenomic studies. Sequence alignment is performed by pClust using the Smith-Waterman algorithm, which is known to give optimal alignment and, hence, greater accuracy than BLAST-based methods. Protein clusters generated by pClust are used to create protein profiles for each species in the dendrogram, these profiles forming a correlation filter library for use with a new taxon. To augment the dendrogram with a new taxon, a protein profile for the taxon is created using BLASTp, and this new taxon is placed into a position within the dendrogram corresponding to the highest correlation with profiles in the correlation filter library. This work was initiated because of our interest in plasmids, and each step is illustrated using proteomes from Gram-negative bacterial plasmids. Proteomes for 527 plasmids were used to generate the dendrogram, and to demonstrate the utility of the insertion algorithm twelve recently sequenced pAKD plasmids were used to augment the dendrogram. Yunyun Zhou, Douglas R. Call, and Shira L. Broschat Copyright © 2013 Yunyun Zhou et al. All rights reserved. Solving the 0/1 Knapsack Problem by a Biomolecular DNA Computer Mon, 18 Feb 2013 07:55:04 +0000 Solving some mathematical problems such as NP-complete problems by conventional silicon-based computers is problematic and takes so long time. DNA computing is an alternative method of computing which uses DNA molecules for computing purposes. DNA computers have massive degrees of parallel processing capability. The massive parallel processing characteristic of DNA computers is of particular interest in solving NP-complete and hard combinatorial problems. NP-complete problems such as knapsack problem and other hard combinatorial problems can be easily solved by DNA computers in a very short period of time comparing to conventional silicon-based computers. Sticker-based DNA computing is one of the methods of DNA computing. In this paper, the sticker based DNA computing was used for solving the 0/1 knapsack problem. At first, a biomolecular solution space was constructed by using appropriate DNA memory complexes. Then, by the application of a sticker-based parallel algorithm using biological operations, knapsack problem was resolved in polynomial time. Hassan Taghipour, Mahdi Rezaei, and Heydar Ali Esmaili Copyright © 2013 Hassan Taghipour et al. All rights reserved. MRMPath and MRMutation, Facilitating Discovery of Mass Transitions for Proteotypic Peptides in Biological Pathways Using a Bioinformatics Approach Tue, 29 Jan 2013 14:45:02 +0000 Quantitative proteomics applications in mass spectrometry depend on the knowledge of the mass-to-charge ratio (m/z) values of proteotypic peptides for the proteins under study and their product ions. MRMPath and MRMutation, web-based bioinformatics software that are platform independent, facilitate the recovery of this information by biologists. MRMPath utilizes publicly available information related to biological pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. All the proteins involved in pathways of interest are recovered and processed in silico to extract information relevant to quantitative mass spectrometry analysis. Peptides may also be subjected to automated BLAST analysis to determine whether they are proteotypic. MRMutation catalogs and makes available, following processing, known (mutant) variants of proteins from the current UniProtKB database. All these results, available via the web from well-maintained, public databases, are written to an Excel spreadsheet, which the user can download and save. MRMPath and MRMutation can be freely accessed. As a system that seeks to allow two or more resources to interoperate, MRMPath represents an advance in bioinformatics tool development. As a practical matter, the MRMPath automated approach represents significant time savings to researchers. Chiquito Crasto, Chandrahas Narne, Mikako Kawai, Landon Wilson, and Stephen Barnes Copyright © 2013 Chiquito Crasto et al. All rights reserved. Statistical Analysis of Terminal Extensions of Protein β-Strand Pairs Mon, 28 Jan 2013 14:05:10 +0000 The long-range interactions, required to the accurate predictions of tertiary structures of β-sheet-containing proteins, are still difficult to simulate. To remedy this problem and to facilitate β-sheet structure predictions, many efforts have been made by computational methods. However, known efforts on β-sheets mainly focus on interresidue contacts or amino acid partners. In this study, to go one step further, we studied β-sheets on the strand level, in which a statistical analysis was made on the terminal extensions of paired β-strands. In most cases, the two paired β-strands have different lengths, and terminal extensions exist. The terminal extensions are the extended part of the paired strands besides the common paired part. However, we found that the best pairing required a terminal alignment, and β-strands tend to pair to make bigger common parts. As a result, 96.97%  of β-strand pairs have a ratio of 25% of the paired common part to the whole length. Also 94.26% and 95.98%  of β-strand pairs have a ratio of 40% of the paired common part to the length of the two β-strands, respectively. Interstrand register predictions by searching interacting β-strands from several alternative offsets should comply with this rule to reduce the computational searching space to improve the performances of algorithms. Ning Zhang, Shan Gao, Lei Zhang, Jishou Ruan, and Tao Zhang Copyright © 2013 Ning Zhang et al. All rights reserved.