﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>EURASIP Journal on Bioinformatics and Systems Biology</title><link>http://www.hindawi.com</link><description>The latest articles from Hindawi Publishing Corporation</description><copyright>&amp;#169; 2012, Hindawi Publishing Corporation. All rights reserved.</copyright><item><title>TRII: A Probabilistic Scoring of Drosophila melanogaster Translation Initiation Sites</title><link>http://www.hindawi.com/journals/bsb/2010/814127/</link><description>Relative individual information is a measurement that scores the quality of DNA- and RNA-binding sites for biological machines. The development of analytical approaches to increase the power of this scoring method will improve its utility in evaluating the functions of motifs. In this study, the scoring method was applied to potential translation initiation sites in Drosophila to compute Translation Relative Individual Information (TRII) scores. The weight matrix at the core of the scoring method was optimized based on high-confidence translation initiation sites identified by using a progressive partitioning approach. Comparing the distributions of TRII scores for sites of interest with those for high-confidence translation initiation sites and random sequences provides a new methodology for assessing the quality of translation initiation sites. The optimized weight matrices can also be used to describe the consensus at translation initiation sites, providing a quantitative measure of preferred and avoided nucleotides at each position.</description><Author>Michael P. Weir and Michael D. Rice</Author><copyright>Copyright &amp;#xa9; 2010 Michael P. Weir and Michael D. Rice. All rights reserved.</copyright></item><item><title>A Hypothesis Test for Equality of Bayesian Network Models</title><link>http://www.hindawi.com/journals/bsb/2010/947564/</link><description>Bayesian network models are commonly used to model gene
expression data. Some applications require a comparison of the network
structure of a set of genes between varying phenotypes. In principle, separately fit models can be directly compared, but it is difficult to assign statistical significance to any observed differences. There would therefore be an
advantage to the development of a rigorous hypothesis test for homogeneity
of network structure.
In this paper, a generalized likelihood ratio test based on Bayesian network models is developed, with significance level estimated using permutation
replications. In order to be computationally feasible, a number of algorithms
are introduced. First, a method for approximating multivariate distributions
due to Chow and Liu (1968) is adapted, permitting the polynomial-time calculation of a maximum likelihood Bayesian network with maximum indegree of
one. Second, sequential testing principles are applied to the permutation test,
allowing significant reduction of computation time while preserving reported
error rates used in multiple testing. The method is applied to gene-set analysis, using two sets of experimental data, and some advantage to a pathway
modelling approach to this problem is reported.</description><Author>Anthony Almudevar</Author><copyright>Copyright &amp;#xa9; 2010 Anthony Almudevar. All rights reserved.</copyright></item><item><title>A Bayesian Analysis for Identifying DNA Copy Number Variations Using a Compound Poisson Process</title><link>http://www.hindawi.com/journals/bsb/2010/268513/</link><description>To study chromosomal aberrations that may lead to cancer formation or genetic diseases, the array-based Comparative Genomic Hybridization (aCGH) technique is often used for detecting DNA copy number variants (CNVs). Various methods have been developed for gaining CNVs information based on aCGH data. However, most of these methods make use of the log-intensity ratios in aCGH data without taking advantage of other information such as the DNA probe (e.g., biomarker) positions/distances contained in the data. Motivated by the specific features of aCGH data, we developed a novel method that takes into account the estimation of a change point or locus of the CNV in aCGH data with its associated biomarker position on the chromosome using a compound Poisson process. We used a Bayesian approach to derive the posterior probability for the estimation of the CNV locus. To detect loci of multiple CNVs in the data, a sliding window process combined with our derived Bayesian posterior probability was proposed. To evaluate the performance of the method in the estimation of the CNV locus, we first performed simulation studies. Finally, we applied our approach to real data from aCGH experiments, demonstrating its applicability.</description><Author>Jie Chen, Ayten Yi&amp;#287;iter, Yu-Ping Wang, and Hong-Wen Deng</Author><copyright>Copyright &amp;#xa9; 2010 Jie Chen et al. All rights reserved.</copyright></item><item><title>Polynomial-Time Algorithm for Controllability Test of a Class of Boolean Biological Networks</title><link>http://www.hindawi.com/journals/bsb/2010/210685/</link><description>In recent years, Boolean-network-model-based approaches to dynamical analysis of complex biological networks such as gene regulatory networks have been extensively studied. One of the fundamental problems in control theory of such networks is the problem of determining whether a given substance quantity can be arbitrarily controlled by operating the other substance quantities, which we call the controllability problem. This paper proposes a polynomial-time algorithm for solving this problem. Although the algorithm is based on a sufficient condition for controllability, it is easily computable for a wider class of large-scale biological networks compared with the existing approaches. A key to this success in our approach is to give up computing Boolean operations in a rigorous way and to exploit an adjacency matrix of a directed graph induced by a Boolean network. By applying the proposed approach to a neurotransmitter signaling pathway, it is shown that it is effective.</description><Author>Koichi Kobayashi, Jun-Ichi Imura, and Kunihiko Hiraishi</Author><copyright>Copyright &amp;#xa9; 2010 Koichi Kobayashi et al. All rights reserved.</copyright></item><item><title>Progression Analysis and Stage Discovery in Continuous Physiological Processes Using Image Computing</title><link>http://www.hindawi.com/journals/bsb/2010/107036/</link><description>We propose an image computing-based method for quantitative analysis of continuous physiological processes that can be sensed by medical imaging and demonstrate its application to the analysis of morphological alterations of the bone structure, which correlate with the progression of osteoarthritis (OA). The purpose of the analysis is to quantitatively estimate OA progression in a fashion that can assist in understanding the pathophysiology of the disease. Ultimately, the texture analysis will be able to provide an alternative OA scoring method, which can potentially reflect the progression of the disease in a more direct fashion compared to the existing clinically utilized classification schemes based on radiology. This method can be useful not just for studying the nature of OA, but also for developing and testing the effect of drugs and treatments. While in this paper we demonstrate the application of the method to osteoarthritis, its generality makes it suitable for the analysis of other progressive clinical conditions that can be diagnosed and prognosed by using medical imaging.</description><Author>Lior Shamir, Salim Rahimi, Nikita Orlov, Luigi Ferrucci, and Ilya G. Goldberg</Author><copyright>Copyright &amp;#x00A9; 2010 Lior Shamir et al. All rights reserved.</copyright></item><item><title>A New-Fangled FES-k-Means Clustering Algorithm for  Disease Discovery and Visual Analytics</title><link>http://www.hindawi.com/journals/bsb/2010/746021/</link><description>The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique&amp;#8212;the Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city&amp;#39;s water service lines.</description><Author>Tonny J. Oyana</Author><copyright>Copyright &amp;#x00A9; 2010 Tonny J. Oyana. All rights reserved.</copyright></item><item><title>Selection of Statistical Thresholds in Graphical Models</title><link>http://www.hindawi.com/journals/bsb/2009/878013/</link><description>Reconstruction of gene regulatory networks based on experimental data usually relies on statistical evidence, necessitating the choice of a statistical threshold which defines a significant biological effect. Approaches to this problem found in the literature range from rigorous multiple testing procedures to ad hoc P-value cut-off points. However, when the data implies graphical structure, it should be possible to exploit this feature in the threshold selection process. In this article we propose a procedure based on this principle. Using coding theory we devise a measure of graphical structure, for example, highly connected
nodes or chain structure. The measure for a particular graph can be compared to that of a random graph and structure inferred on that basis. By varying the statistical threshold the maximum deviation from random structure can be estimated, and the threshold is then chosen on that basis. A global test for graph structure follows naturally.</description><Author>Anthony Almudevar</Author><copyright>Copyright &amp;#x00A9; 2009 Anthony Almudevar. All rights reserved.</copyright></item><item><title>Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data&amp;#8212;A Model-Based Study</title><link>http://www.hindawi.com/journals/bsb/2009/504069/</link><description>Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates.</description><Author>Youting Sun, Ulisses Braga-Neto, and Edward R. Dougherty</Author><copyright>Copyright &amp;#x00A9; 2009 Youting Sun et al. All rights reserved.</copyright></item><item><title>Using a State-Space Model and Location Analysis to Infer Time-Delayed Regulatory Networks</title><link>http://www.hindawi.com/journals/bsb/2009/484601/</link><description>Computational gene regulation models provide a means for scientists to draw biological inferences from time-course gene expression data. Based on the state-space approach, we developed a new modeling tool for inferring gene regulatory networks, called time-delayed Gene Regulatory Networks (tdGRNs). tdGRN takes time-delayed regulatory relationships into consideration when developing the model. In addition, a priori biological knowledge from genome-wide location analysis is incorporated into the structure of the gene regulatory network. tdGRN is evaluated on both an artificial dataset and a published gene expression data set. It not only determines regulatory relationships that are known to exist but also uncovers potential new ones. The results indicate that the proposed tool is effective in inferring gene regulatory relationships with time delay. tdGRN is complementary to existing methods for inferring gene regulatory networks. The novel part of the proposed tool is that it is able to infer time-delayed regulatory relationships.</description><Author>Chushin Koh, Fang-Xiang Wu, Gopalan Selvaraj, and Anthony J. Kusalik</Author><copyright>Copyright &amp;#x00A9; 2009 Chushin Koh et al. All rights reserved.</copyright></item><item><title>Stochastic Simulation of Delay-Induced Circadian Rhythms in Drosophila</title><link>http://www.hindawi.com/journals/bsb/2009/386853/</link><description>Circadian rhythms are ubiquitous in all eukaryotes and some prokaryotes. Several computational models with or without time delays have been developed for circadian rhythms. Exact
stochastic simulations have been carried out for several models without time delays, but no
exact stochastic simulation has been done for models with delays. In this paper, we proposed
a detailed and a reduced stochastic model with delays for circadian rhythms in Drosophila
based on two deterministic models of Smolen et al. and employed exact stochastic simulation
to simulate circadian oscillations. Our simulations showed that both models can produce sustained
oscillations and that the oscillation is robust to noise in the sense that there is very little
variability in oscillation period although there are significant random fluctuations in oscillation
peeks. Moreover, although average time delays are essential to simulation of oscillation,
random changes in time delays within certain range around fixed average time delay cause
little variability in the oscillation period. Our simulation results also showed that both models
are robust to parameter variations and that oscillation can be entrained by light/dark circles.
Our simulations further demonstrated that within a reasonable range around the experimental
result, the rates that dclock and per promoters switch back and forth between activated and
repressed sites have little impact on oscillation period.</description><Author>Zhouyi Xu and Xiaodong Cai</Author><copyright>Copyright &amp;#x00A9; 2009 Zhouyi Xu and Xiaodong Cai. All rights reserved.</copyright></item><item><title>Modelling Transcriptional Regulation with a Mixture of Factor Analyzers and Variational Bayesian Expectation Maximization</title><link>http://www.hindawi.com/journals/bsb/2009/601068/</link><description>Understanding the mechanisms of gene transcriptional regulation through
analysis of high-throughput postgenomic data is one of the central problems of computational systems biology. Various approaches have been proposed, but most of them fail to address at least one of the following
objectives: (1) allow for the fact that transcription factors are potentially
subject to posttranscriptional regulation; (2) allow for the fact that transcription factors cooperate as a functional complex in regulating gene expression, and (3) provide a model and a learning algorithm with manageable computational complexity. The objective of the present study is to propose and test a method that addresses these three issues. The model we employ is a mixture of factor analyzers, in which the latent variables correspond to different transcription factors, grouped into complexes or modules. We pursue inference in a Bayesian framework, using the Variational Bayesian Expectation Maximization (VBEM) algorithm for approximate inference of the posterior distributions of the model parameters, and estimation of a lower bound on the marginal likelihood for model selection. We have evaluated the performance of the proposed
method on three criteria: activity profile reconstruction, gene clustering, and network inference.</description><Author>Kuang Lin and Dirk Husmeier</Author><copyright>Copyright &amp;#x00A9; 2009 Kuang Lin and Dirk Husmeier. All rights reserved.</copyright></item><item><title>Integrating Biosystem Models Using Waveform Relaxation</title><link>http://www.hindawi.com/journals/bsb/2008/308623/</link><description>Modelling in systems biology often involves the integration of component models into larger composite models. How to do this systematically and efficiently is a significant challenge: coupling of components can be unidirectional or bidirectional, and of variable strengths. We adapt the waveform relaxation (WR) method for parallel computation of ODEs as a general methodology for computing systems of linked submodels. Four test cases are presented: (i) a cascade of unidirectionally and bidirectionally coupled harmonic oscillators, (ii) deterministic and stochastic simulations of calcium oscillations, (iii) single cell calcium oscillations showing complex behaviour such as periodic and chaotic bursting, and (iv) a multicellular calcium model for a cell plate of hepatocytes. We conclude that WR provides a flexible means to deal with multitime-scale computation and model heterogeneity. Global solutions over time can be captured independently of the solution techniques for the individual components, which may be distributed in different computing environments.</description><Author>Linzhong Li, Robert M. Seymour, and Stephen Baigent</Author><copyright>Copyright &amp;#x00A9; 2008 Linzhong Li et al. All rights reserved.</copyright></item><item><title>Algorithms and Complexity Analyses for Control of Singleton Attractors in Boolean Networks</title><link>http://www.hindawi.com/journals/bsb/2008/521407/</link><description>A Boolean network (BN) is a mathematical model of genetic networks. We propose several algorithms for control of singleton attractors in BN. We theoretically estimate the average-case time complexities of the proposed algorithms, and confirm them by computer experiments. The results suggest the importance of gene ordering. Especially, setting internal nodes ahead yields shorter computational time than setting external nodes ahead in various types of algorithms. We also present a heuristic algorithm which does not look for the optimal solution but for the solution whose computational time is shorter than that of the exact algorithms.</description><Author>Morihiro Hayashida, Takeyuki Tamura, Tatsuya Akutsu, Shu-Qin Zhang, and Wai-Ki Ching</Author><copyright>Copyright &amp;#x00A9; 2008 Morihiro Hayashida et al. All rights reserved.</copyright></item><item><title>Using Temporal Correlation in Factor Analysis for Reconstructing Transcription Factor Activities</title><link>http://www.hindawi.com/journals/bsb/2008/172840/</link><description>Two-level gene regulatory networks consist of the transcription factors (TFs) in the top level and their regulated genes in the second level. The expression profiles of the regulated genes are the observed high-throughput data given by experiments such as microarrays. The activity profiles of the TFs are treated as hidden variables as well as the connectivity matrix that indicates the regulatory relationships of TFs with their regulated genes. Factor analysis (FA) as well as other methods, such as the network component algorithm, has been suggested for reconstructing gene regulatory networks and also for predicting TF activities. They have been applied to E. coli and yeast data with the assumption that these datasets consist of identical and independently distributed samples. Thus, the main drawback of these algorithms is that they ignore any time correlation existing within the TF profiles. In this paper, we extend previously studied FA algorithms to include time correlation within the transcription factors. At the same time, we consider connectivity matrices that are sparse in order to capture the existing sparsity present in gene regulatory networks. The TFs activity profiles obtained by this approach are significantly smoother than profiles from previous FA algorithms. The periodicities in profiles from yeast expression data become prominent in our reconstruction. Moreover, the strength of the correlation between time points is estimated and can be used to assess the suitability of the experimental time interval.</description><Author>Iosifina Pournara and Lorenz Wernisch</Author><copyright>Copyright &amp;#x00A9; 2008 Iosifina Pournara and Lorenz Wernisch. All rights reserved.</copyright></item><item><title>Inference of Boolean Networks Using Sensitivity Regularization</title><link>http://www.hindawi.com/journals/bsb/2008/780541/</link><description>The inference of genetic regulatory networks from global measurements of gene expressions is an important problem in computational biology. Recent studies suggest that such dynamical molecular systems are poised at a critical phase transition between an ordered and a disordered phase, affording the ability to balance stability and adaptability while coordinating complex macroscopic behavior. We investigate whether incorporating this dynamical system-wide property as an assumption in the inference process is beneficial in terms of reducing the inference error of the designed network. Using Boolean networks, for which there are well-defined notions of ordered, critical, and chaotic dynamical regimes as well as well-studied inference procedures, we analyze the expected inference error relative to deviations in the networks' dynamical regimes from the assumption of criticality. We demonstrate that taking criticality into account via a penalty term in the inference procedure improves the accuracy of prediction both in terms of state transitions and network wiring, particularly for small sample sizes.</description><Author>Wenbin Liu, Harri L&amp;#228;hdesm&amp;#228;ki, Edward R. Dougherty, and Ilya Shmulevich</Author><copyright>Copyright &amp;#x00A9; 2008 Wenbin Liu et al. All rights reserved.</copyright></item><item><title>Gene Regulatory Network Reconstruction Using Conditional Mutual Information</title><link>http://www.hindawi.com/journals/bsb/2008/253894/</link><description>The inference of gene regulatory network from expression data is an important area of research that provides insight to the inner workings of a biological system. The relevance-network-based approaches provide a simple and easily-scalable solution to the understanding of interaction between genes. Up until now, most works based on relevance network focus on the discovery of direct regulation using correlation coefficient or mutual information. However, some of the more complicated interactions such as interactive regulation and coregulation are not easily detected. In this work, we propose a relevance network model for gene regulatory network inference which employs both mutual information and conditional mutual information to determine the interactions between genes. For this purpose, we propose a conditional mutual information estimator based on adaptive partitioning which allows us to condition on both discrete and continuous random variables. We provide experimental results that demonstrate that the
proposed regulatory network inference algorithm can provide better performance when the target network contains coregulated and interactively regulated genes.</description><Author>Kuo-Ching Liang and Xiaodong Wang</Author><copyright>Copyright &amp;#x00A9; 2008 Kuo-Ching Liang and Xiaodong Wang. All rights reserved.</copyright></item><item><title>Detecting Periodic Genes from Irregularly Sampled Gene Expressions: A Comparison Study</title><link>http://www.hindawi.com/journals/bsb/2008/769293/</link><description>Time series microarray measurements of gene expressions have been exploited to discover genes involved in cell cycles. Due to experimental constraints, most
microarray observations are obtained through irregular sampling. In this paper three
popular spectral analysis schemes, namely, Lomb-Scargle, Capon and missing-data
amplitude and phase estimation (MAPES), are compared in terms of their ability
and efficiency to recover periodically expressed genes. Based on in silico experiments for microarray measurements of Saccharomyces cerevisiae, Lomb-Scargle is found to be the most efficacious scheme. 149 genes are then identified to be periodically expressed in the Drosophila melanogaster data set.</description><Author>Wentao Zhao, Kwadwo Agyepong, Erchin Serpedin, and Edward R. Dougherty</Author><copyright>Copyright &amp;#x00A9; 2008 Wentao Zhao et al. All rights reserved.</copyright></item><item><title>Recovering Genetic Regulatory Networks from Chromatin Immunoprecipitation and Steady-State Microarray Data</title><link>http://www.hindawi.com/journals/bsb/2008/248747/</link><description>Recent advances in high-throughput DNA microarrays and chromatin immunoprecipitation (ChIP) assays have enabled the learning of the structure and functionality of genetic regulatory networks. In light of these heterogeneous data sets, this paper proposes a novel approach for reconstruction of genetic regulatory networks
based on the posterior probabilities of gene regulations. Built within the
framework of Bayesian statistics and computational Monte Carlo techniques, the
proposed approach prevents the dichotomy of classifying gene interactions as either
being connected or disconnected, thereby it reduces significantly the inference errors. Simulation results corroborate the superior performance of the proposed approach
relative to the existing state-of-the-art algorithms. A genetic regulatory network
for Saccharomyces cerevisiae is inferred based on the published real data sets, and biological meaningful results are discussed.</description><Author>Wentao Zhao, Erchin Serpedin, and Edward R. Dougherty</Author><copyright>Copyright &amp;#x00A9; 2008 Wentao Zhao et al. All rights reserved.</copyright></item><item><title>Optimal Constrained Stationary Intervention in Gene Regulatory Networks</title><link>http://www.hindawi.com/journals/bsb/2008/620767/</link><description>A key objective of gene network modeling
is to develop intervention strategies to alter regulatory
dynamics in such a way as to reduce the likelihood of
undesirable phenotypes. Optimal stationary intervention
policies have been developed for gene regulation in the
framework of probabilistic Boolean networks in a number
of settings. To mitigate the possibility of detrimental side
effects, for instance, in the treatment of cancer, it may
be desirable to limit the expected number of treatments
beneath some bound. This paper formulates a general constraint
approach for optimal therapeutic intervention by
suitably adapting the reward function and then applies this
formulation to bound the expected number of treatments.
A mutated mammalian cell cycle is considered as a case
study.</description><Author>Babak Faryabi, Golnaz Vahedi, Jean-Francois Chamberland, Aniruddha Datta, and Edward R. Dougherty</Author><copyright>Copyright &amp;#x00A9; 2008 Babak Faryabi et al. All rights reserved.</copyright></item><item><title>A Time-Series-Based Feature Extraction Approach for Prediction of Protein Structural Class</title><link>http://www.hindawi.com/journals/bsb/2008/235451/</link><description>This paper presents a novel feature vector based on physicochemical property of amino acids for prediction protein structural classes. The proposed method is divided into three different stages. First, a discrete time series representation to protein sequences using physicochemical scale is provided. Later on, a wavelet-based time-series technique is proposed for extracting features from mapped amino acid sequence and a fixed length feature vector for classification is constructed. The proposed feature space summarizes the variance information of ten different biological properties of amino acids. Finally, an optimized support vector machine model is constructed for prediction of each protein structural class. The proposed approach is evaluated using leave-one-out cross-validation tests on two standard datasets. Comparison of our result with existing approaches shows that overall accuracy achieved by our approach is better than exiting methods.</description><Author>Ravi Gupta, Ankush Mittal, and Kuldip Singh</Author><copyright>Copyright &amp;#x00A9; 2008 Ravi Gupta et al. All rights reserved.</copyright></item><item><title>Which Is Better: Holdout or Full-Sample Classifier Design?</title><link>http://www.hindawi.com/journals/bsb/2008/297945/</link><description>Is it better to design a classifier and estimate its error on the full sample or to design a
classifier on a training subset and estimate its error on the holdout test subset? Full-sample
design provides the better classifier; nevertheless, one might choose holdout with the hope of better error estimation. A conservative criterion to decide the best course is to aim at a classifier whose error is less than a given bound. Then the choice between full-sample and holdout  designs  depends on which possesses the smaller expected bound. Using this criterion, we examine the choice between holdout and several full-sample error estimators using covariance models and a patient-data model. Full-sample design consistently outperforms holdout design. The relation between the two designs is revealed via a decomposition of the expected bound into the sum of the expected true error and the expected conditional standard deviation of the true error.</description><Author>Marcel Brun, Qian Xu, and Edward R. Dougherty</Author><copyright>Copyright &amp;#x00A9; 2008 Marcel Brun et al. All rights reserved.</copyright></item><item><title>Bayesian Hierarchical Model for Estimating Gene Expression Intensity Using Multiple Scanned Microarrays</title><link>http://www.hindawi.com/journals/bsb/2008/231950/</link><description>We propose a method for improving the quality of signal from DNA microarrays by using several scans at varying scanner sen-sitivities. A Bayesian latent intensity model is introduced for the analysis of such data. The method improves the accuracy at which expressions can be measured in all ranges and extends the dynamic range of measured gene expression at the high end. Our method is generic and can be applied to data from any organism, for imaging with any scanner that allows varying the laser power, and for extraction with any image analysis software. Results from a self-self hybridization data set illustrate an improved precision in the estimation of the expression of genes compared to what can be achieved by applying standard methods and using only a single scan.</description><Author>Rashi Gupta, Elja Arjas, Sangita Kulathinal, Andrew Thomas, and Petri Auvinen</Author><copyright>Copyright &amp;#x00A9; 2008 Rashi Gupta et al. All rights reserved.</copyright></item><item><title>Combining Evidence, Specificity, and Proximity towards the Normalization of Gene Ontology Terms in Text</title><link>http://www.hindawi.com/journals/bsb/2008/342746/</link><description>Structured information provided by manual annotation of proteins with Gene Ontology concepts represents a high-quality reliable data source for the research community. However, a limited scope of proteins is annotated due to the amount of human resources required to fully annotate each individual gene product from the literature. We introduce a novel method for automatic identification of GO terms in natural language text. The method takes into consideration several features: (1) the evidence
for a GO term given by the words occurring in text, (2) the proximity between the
words, and (3) the specificity of the GO terms based on their information content.
The method has been evaluated on the BioCreAtIvE corpus and has been compared to
current state of the art methods. The precision reached 0.34 at a recall of 0.34 for the
identified terms at rank 1. In our analysis, we observe that the identification of GO
terms in the &amp;#8220;cellular component&amp;#8221; subbranch of GO is more accurate than for terms from the other two subbranches. This observation is explained by the average number of words forming the terminology over the different subbranches.</description><Author>S. Gaudan, A. Jimeno Yepes, V. Lee, and D. Rebholz-Schuhmann</Author><copyright>Copyright &amp;#x00A9; 2008 S. Gaudan et al. All rights reserved.</copyright></item><item><title>Inference of Gene Regulatory Networks Based on a Universal Minimum Description Length</title><link>http://www.hindawi.com/journals/bsb/2008/482090/</link><description>The Boolean network paradigm is a simple and effective way to interpret genomic systems, but discovering the structure of these networks remains a difficult task. The minimum description length (MDL) principle has already been used for inferring genetic regulatory networks from time-series expression data and has proven useful for recovering the directed connections in Boolean networks. However, the existing method uses an ad hoc measure of description length that necessitates a tuning parameter for artificially balancing the model and error costs and, as a result, directly conflicts with the MDL principle's implied universality. In order to surpass this difficulty, we propose a novel MDL-based method in which the description length is a theoretical measure derived from a universal normalized maximum likelihood model. The search space is reduced by applying an implementable analogue of Kolmogorov&amp;#39;s structure function. The performance of the proposed method is demonstrated on random synthetic networks, for which it is shown to improve upon previously published network inference algorithms with respect to both speed and accuracy. Finally, it is applied to time-series Drosophila gene expression measurements.</description><Author>John Dougherty, Ioan Tabus, and Jaakko Astola</Author><copyright>Copyright &amp;#x00A9; 2008 John Dougherty et al. All rights reserved.</copyright></item><item><title>I&amp;#x03BA;B, NF-&amp;#x03BA;B Regulation Model: Simulation Analysis of Small Number of Molecules</title><link>http://www.hindawi.com/journals/bsb/2007/025250/abs/</link><description>The regulation of I&amp;#x03BA;B, NF-&amp;#x03BA;B is of foremost interest in biology as the transcription
factor NF-&amp;#x03BA;B has multiple target genes. We have modeled a previously published model by Hoffmann et al. (2002) of I&amp;#x03BA;B, NF-&amp;#x03BA;B mathematically as discrete reaction systems. We
have used stochastic algorithm to compare the results when there are large and  small
numbers of molecules available in a finite volume  for each protein. Our results for small
number of molecules show that with continuous presence of stimulation, nuclear NF-&amp;#x03BA;B oscillates continuously in every individual cell rather than damping, which was observed in cell population results. This characteristic of the system is missed when averaged behavior is studied.</description><Author>Anamika Sarkar, Marina Meila, and Robert B. Franza</Author><copyright>Copyright &amp;#x00A9; 2007 Anamika  Sarkar et al. All rights reserved.</copyright></item><item><title>Extraction of Protein Interaction Data: A Comparative Analysis of Methods in Use</title><link>http://www.hindawi.com/journals/bsb/2007/053096/abs/</link><description>Several natural language processing tools, both commercial and freely available, are used to extract protein interactions from publications. Methods used by these tools include pattern matching to dynamic programming with individual recall and precision rates. A methodical survey of these tools, keeping in mind the minimum interaction information a researcher would need, in comparison to manual analysis has not been carried out. We compared data generated using some of the selected NLP tools with manually curated protein interaction data  (PathArt and IMaps) to comparatively determine the recall and precision rate. The rates were found to be lower than the published scores when a normalized definition for interaction is considered. Each data point captured wrongly or not picked up by the tool was analyzed. Our evaluation brings forth critical failures of NLP tools and provides pointers for the development of an ideal NLP tool.</description><Author>Hena Jose, Thangavel Vadivukarasi, and Jyothi Devakumar</Author><copyright>Copyright &amp;#x00A9; 2007 Hena  Jose et al. All rights reserved.</copyright></item><item><title>Question Processing and Clustering in INDOC: A Biomedical Question 
      Answering System</title><link>http://www.hindawi.com/journals/bsb/2007/028576/abs/</link><description>The exponential growth in the volume of publications in the biomedical domain has made it impossible for an individual to keep pace with the advances. Even though evidence-based medicine has gained wide acceptance, the physicians are unable to access the relevant information in the required time, leaving most of the questions unanswered. This accentuates the need for fast and accurate biomedical question answering systems. In this paper we introduce INDOC&amp;#8212;a biomedical question answering system based on novel ideas of indexing and extracting the answer to the questions posed. INDOC displays the results in clusters to help the user arrive the most relevant set of documents quickly. Evaluation was done against the standard OHSUMED test collection. Our system achieves high accuracy and minimizes user effort.</description><Author>Parikshit Sondhi, Purushottam Raj, V. Vinod Kumar, and Ankush Mittal</Author><copyright>Copyright &amp;#x00A9; 2007 Parikshit  Sondhi et al. All rights reserved.</copyright></item><item><title>Decorrelation of the True and Estimated Classifier Errors in High-Dimensional Settings</title><link>http://www.hindawi.com/journals/bsb/2007/038473/abs/</link><description>The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. Given the huge number of features and the small number of examples, model validity which refers to the precision of error estimation is a critical issue. Previous
studies have addressed this issue via the deviation distribution (estimated error minus true error), in particular, the deterioration of cross-validation precision in high-dimensional
settings where feature selection is used to mitigate the peaking phenomenon (overfitting). Because classifier design is based upon random samples,  both the true and estimated errors are sample-dependent random variables, and one would expect a loss of precision if the estimated and true errors are not well correlated, so that natural questions arise as to the degree of correlation and the manner in which lack of correlation impacts error estimation. We demonstrate the effect of correlation on error precision via a decomposition of the variance of the deviation distribution, observe that the correlation is often severely decreased in high-dimensional settings, and show that the effect of high dimensionality on error estimation tends to result more from its decorrelating effects than from its impact on the
variance of the estimated error. We consider the correlation between the true and estimated errors under different experimental conditions using both synthetic and real data, several feature-selection methods, different classification rules, and three error estimators commonly used (leave-one-out cross-validation, k-fold cross-validation, and .632 bootstrap). Moreover, three scenarios are considered: (1) feature selection, (2) known-feature set, and (3) all features. Only the first is of practical interest; however, the other two are needed for comparison purposes. We will observe that the true and estimated errors tend to be much more correlated in the case of a known feature set than with either feature selection or using all features, with the better correlation between the latter two showing no general trend, but differing for different models.</description><Author>Blaise Hanczar, Jianping Hua, and Edward R. Dougherty</Author><copyright>Copyright &amp;#x00A9; 2007 Blaise  Hanczar et al. All rights reserved.</copyright></item><item><title>Genome-Wide Analysis of Intergenic Regions of Mycobacterium tuberculosis H37Rv Using Affymetrix GeneChips</title><link>http://www.hindawi.com/journals/bsb/2007/023054/abs/</link><description>Sequencing the complete genome of Mycobacterium tuberculosis H37Rv is a major milestone in the genome project and it sheds new light in our fight with tuberculosis. The genome contains around 4000 genes (protein-coding sequences)
in the original genome annotation. A subsequent reannotation of the genome has added 80 more genes. However, we have found that the intergenic regions can exhibit expression signals, as evidenced by microarray hybridization. It is then reasonable to suspect that there are unidentified genes in these regions. We conducted a genome-wide analysis using the Affymetrix GeneChip to explore genes contained in the intergenic sequences of the M. tuberculosis H37Rv genome. A working criterion for potential protein-coding genes was based on bioinformatics, consisting of the gene structure, protein coding potential, and presence of ortholog evidence. The bioinformatics criteria in conjunction with transcriptional evidence revealed potential genes with a specific function, such as a DNA-binding protein in the CopG family and a nickle binding GTPase, as well as hypothetical proteins that had not been reported in the H37Rv genome. This study further demonstrated that microarray-based transcriptional evidence would facilitate genome-wide gene finding, and is also the first report concerning intergenic expression in M. tuberculosis genome.</description><Author>Li M. Fu and Thomas M. Shinnick</Author><copyright>Copyright &amp;#x00A9; 2007 Li M. Fu and Thomas M. Shinnick. All rights reserved.</copyright></item><item><title>Computational Methods for Estimation of Cell Cycle Phase Distributions of Yeast Cells</title><link>http://www.hindawi.com/journals/bsb/2007/046150/abs/</link><description>Two computational methods for estimating the cell cycle phase distribution of
a budding yeast (Saccharomyces cerevisiae) cell population are presented. The first one is a nonparametric method that is based on the analysis of DNA content in the individual cells of the population. The DNA content is measured with a
fluorescence-activated cell sorter (FACS). The second method is based on budding
index analysis. An automated image analysis method is presented for the task
of detecting the cells and buds. The proposed methods can be used to obtain
quantitative information on the cell cycle phase distribution of a budding yeast
S. cerevisiae population. They therefore provide a solid basis for obtaining the complementary information needed in deconvolution of gene expression data. As a
case study, both methods are tested with data that were obtained in a time series
experiment with S. cerevisiae. The details of the time series experiment as well as the image and FACS data obtained in the experiment can be found in the online
additional material at  http://www.cs.tut.fi/sgn/csb/yeastdistrib/.</description><Author>Antti Niemist&amp;#246;, Matti Nykter, Tommi Aho, Henna Jalovaara, Kalle Marjanen, Miika Ahdesm&amp;#228;ki, Pekka Ruusuvuori, Mikko Tiainen, Marja-Leena Linne, and Olli Yli-Harja</Author><copyright>Copyright &amp;#x00A9; 2007 Antti  Niemist&amp;#246; et al. All rights reserved.</copyright></item></channel></rss>
