﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Advances in Bioinformatics</title><link>http://www.hindawi.com</link><description>The latest articles from Hindawi Publishing Corporation</description><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright><item><title>Testing the Coding Potential of Conserved Short Genomic Sequences</title><link>http://www.hindawi.com/journals/abi/2010/287070.html</link><description>Proposed is a procedure to test whether a genomic sequence contains coding DNA, called a coding potential region. The procedure tests the coding potential of conserved short genomic sequence, in which the assumptions on the probability models of gene structures
are relaxed. Thus, it is expected to provide additional candidate regions that contain coding
DNAs to the current genomic database. The procedure was applied to the set of highly conserved human-mouse sequences in the genome database at the University of California at Santa Cruz. For sequences containing
RefSeq coding exons, the procedure detected 91.3&amp;#37; regions having coding potential in this
set, which covers 83&amp;#37; of the human RefSeq coding exons, at a 2.6&amp;#37; false positive rate. The
procedure detected 12,688 novel short regions with coding potential at the false discovery
rate &amp;#60;0.05; 65.7&amp;#37; of the novel regions are between annotated genes.</description><Author>Jing Wu</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Network Properties for Ranking Predicted miRNA Targets in Breast Cancer</title><link>http://www.hindawi.com/journals/abi/2009/182689.html</link><description>MicroRNAs control the expression of their target genes by translational repression and transcriptional cleavage. They are involved in various biological processes including development and progression of cancer. To uncover the biological role of miRNAs it is important to identify their target genes. The small number of experimentally validated target genes makes computer prediction methods very important. However, state-of-the-art prediction tools result in a great number of putative targets with an unpredictable number of false positives. In this paper, we propose and evaluate two approaches for ranking the biological relevance of putative targets of miRNAs which are associated with breast cancer.</description><Author>J&amp;#246;rg Linde, Bj&amp;#246;rn Olsson, and Zelmina Lubovac</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Pathway-Based Feature Selection Algorithm for Cancer Microarray Data</title><link>http://www.hindawi.com/journals/abi/2009/532989.html</link><description>Classification of cancers based on gene expressions produces better accuracy
when compared to that of the clinical markers. Feature selection improves
the accuracy of these classification algorithms by reducing the chance
of overfitting that happens due to large number of features. We develop a
new feature selection method called Biological Pathway-based Feature Selection (BPFS) for microarray data. Unlike most of the existing methods,
our method integrates signaling and gene regulatory pathways with gene
expression data to minimize the chance of overfitting of the method and to
improve the test accuracy. Thus, BPFS selects a biologically meaningful feature
set that is minimally redundant. Our experiments on published breast
cancer datasets demonstrate that all of the top 20 genes found by our method
are associated with cancer. Furthermore, the classification accuracy of our
signature is up to 18&amp;#37; better than that of vant Veers 70 gene signature,
and it is up to 8&amp;#37; better accuracy than the best published feature selection
method, I-RELIEF.</description><Author>Nirmalya Bandyopadhyay, Tamer Kahveci, Steve Goodison, Y. Sun, and Sanjay Ranka</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Evolution and Diversity of the Human Hepatitis D Virus Genome</title><link>http://www.hindawi.com/journals/abi/2010/323654.html</link><description>Human hepatitis delta virus (HDV) is the smallest RNA virus in genome. HDV genome is divided into a viroid-like sequence and a protein-coding sequence which could have originated from different resources and the HDV genome was eventually constituted through RNA recombination. The genome subsequently diversified through accumulation of mutations selected by interactions between the mutated RNA and proteins with host factors to successfully form the infectious virions. Therefore, we propose that the conservation of HDV nucleotide sequence is highly related with its functionality. Genome analysis of known HDV isolates shows that the C-terminal coding sequences of large delta antigen (LDAg) are the highest diversity than other regions of protein-coding sequences but they still retain biological functionality to interact with the heavy chain of clathrin can be selected and maintained. Since viruses interact with many host factors, including escaping the host immune response, how to design a program to predict RNA genome evolution is a great challenging work.</description><Author>Chi-Ruei Huang and Szecheng J. Lo</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Accurate and Scalable Techniques for the Complex/Pathway Membership Problem in Protein Networks</title><link>http://www.hindawi.com/journals/abi/2009/787128.html</link><description>A protein network shows physical interactions as well as functional associations. An important
usage of such networks is to discover unknown members of partially known complexes and
pathways. A number of methods exist for such analyses, and they can be divided into two main
categories based on their treatment of highly connected proteins. In this paper, we show that
methods that are not affected by the degree (number of linkages) of a protein give more accurate
predictions for certain complexes and pathways. We propose a network flow-based technique
to compute the association probability of a pair of proteins. We extend the proposed technique
using hierarchical clustering in order to scale well with the size of proteome. We also show that
top-k queries are not suitable for a large number of cases, and threshold queries are more meaningful
in these cases. Network flow technique with clustering is able to optimize meaningful
threshold queries and answer them with high efficiency compared to a similar method that uses
Monte Carlo simulation.</description><Author>Orhan &amp;#199;amo&amp;#287;lu, Tolga Can, and Ambuj K. Singh</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Recent Bioinformatics Advances in the Analysis of High Throughput Flow Cytometry Data</title><link>http://www.hindawi.com/journals/abi/2009/461763.html</link><description /><Author>Raphael Gottardo, Ryan R. Brinkman, George Luta, and Matt P. Wand</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Algorithmic Assessment of Vaccine-Induced Selective Pressure and Its Implications on Future Vaccine Candidates</title><link>http://www.hindawi.com/journals/abi/2010/178069.html</link><description>Posttrial assessment of a vaccine&amp;#39;s selective pressure on infecting strains may be realized through a bioinformatic tool such as parsimony phylogenetic analysis. Following a failed gonococcal pilus vaccine trial of Neisseria gonorrhoeae, we conducted a phylogenetic analysis of pilin DNA and predicted peptide sequences from clinical isolates to assess the extent of the vaccine&amp;#39;s effect on the type of field strains that the volunteers contracted. Amplified pilin DNA sequences from infected vaccinees, placebo recipients, and vaccine specimens were phylogenetically analyzed. Cladograms show that the vaccine peptides have diverged substantially from their paternal isolate by clustering distantly from each other. Pilin genes of the field clinical isolates were heterogeneous, and their peptides produced clades comprised of vaccinated and placebo recipients&amp;#39; strains indicating that the pilus vaccine did not exert any significant selective pressure on gonorrhea field strains. Furthermore, sequences of the semivariable and hypervariable regions pointed out heterotachous rates of mutation and substitution.</description><Author>Mones S. Abu-Asab, Majid Laassri, and Hakima Amri</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Synonymous Codon Usage Analysis of Thirty Two Mycobacteriophage Genomes</title><link>http://www.hindawi.com/journals/abi/2009/316936.html</link><description>Synonymous codon usage of protein coding genes of thirty two completely sequenced mycobacteriophage genomes was studied using multivariate statistical analysis. One of the major factors influencing codon usage is identified to be compositional bias. Codons ending with either C or G are preferred in highly expressed genes among which C ending codons are highly preferred over G ending codons. A strong negative correlation between effective number of codons (Nc) and GC3s content was also observed, showing that the codon usage was effected by gene nucleotide composition. Translational selection is also identified to play a role in shaping the codon usage operative at the level of translational accuracy. High level of heterogeneity is seen among and between the genomes. Length of genes is also identified to influence the codon usage in 11 out of 32 phage genomes. Mycobacteriophage Cooper is identified to be the highly biased genome with better translation efficiency comparing well with the host specific tRNA genes.</description><Author>Sameer Hassan, Vasantha Mahalingam, and Vanaja Kumar</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Tree-Based Methods for Discovery of Association between Flow Cytometry Data and Clinical Endpoints</title><link>http://www.hindawi.com/journals/abi/2009/235320.html</link><description>We demonstrate the application and comparative interpretations of three tree-based algorithms for the analysis of data arising from flow cytometry: classification and regression trees (CARTs), random forests (RFs), and logic regression (LR). Specifically, we consider the question of what best predicts CD4 T-cell recovery in HIV-1 infected persons starting antiretroviral therapy with CD4 count between 200 and 350&amp;#x2009;cell/&amp;#x03BC;L. A comparison to a more standard contingency table analysis is provided. While contingency table analysis and RFs provide information on the importance of each potential predictor variable, CART and LR offer additional insight into the combinations of variables that together are predictive of the outcome. In all cases considered, baseline CD3-DR-CD56+CD16+ emerges as an important predictor variable, while the tree-based approaches identify additional variables as potentially informative. Application of tree-based methods to our data suggests that a combination of baseline immune activation states, with emphasis on CD8 T-cell activation, may be a better predictor than any single T-cell/innate cell subset analyzed. Taken together, we show that tree-based methods can be successfully applied to flow cytometry data to better inform and discover associations that may not emerge in the context of a univariate analysis.</description><Author>M. Eliot, L. Azzoni, C. Firnhaber, W. Stevens, D. K. Glencross, I. Sanne, L. J. Montaner, and A. S. Foulkes</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>A Survey of Flow Cytometry Data Analysis Methods</title><link>http://www.hindawi.com/journals/abi/2009/584603.html</link><description>Flow cytometry (FCM) is widely used in health research and in treatment for a variety of tasks, such as in the diagnosis and monitoring of leukemia and lymphoma patients, providing the counts of helper-T lymphocytes needed
to monitor the course and treatment of HIV infection, the evaluation of peripheral blood hematopoietic stem cell
grafts, and many other diseases. In practice, FCM data analysis is performed manually, a process that requires an
inordinate amount of time and is error-prone, nonreproducible, nonstandardized, and not open for re-evaluation,
making it the most limiting aspect of this technology. This paper reviews state-of-the-art FCM data analysis
approaches using a framework introduced to report each of the components in a data analysis pipeline. Current
challenges and possible future directions in developing fully automated FCM data analysis tools are also outlined.</description><Author>Ali Bashashati and Ryan R. Brinkman</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Automatic Clustering of Flow Cytometry Data with Density-Based Merging</title><link>http://www.hindawi.com/journals/abi/2009/686759.html</link><description>The ability of flow cytometry to allow fast single cell interrogation of a large number of cells has
made this technology ubiquitous and indispensable in the clinical and laboratory setting. A current limit to the potential of this technology is the lack of automated tools for analyzing the resulting data. We describe methodology and software to automatically identify cell populations in flow cytometry data. Our approach advances the paradigm of manually gating sequential two-dimensional projections of the data to a procedure that automatically produces gates based on statistical theory. Our approach is nonparametric and can reproduce nonconvex subpopulations that are known to occur in flow cytometry samples, but which cannot be produced with current parametric model-based approaches. We illustrate the methodology with a sample of mouse spleen and peritoneal cavity cells.</description><Author>Guenther Walther, Noah Zimmerman, Wayne Moore, David Parks, Stephen Meehan, Ilana Belitskaya, Jinhui Pan, and Leonore Herzenberg</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Fluorescence Intensity Normalisation: Correcting for Time Effects in Large-Scale Flow Cytometric Analysis</title><link>http://www.hindawi.com/journals/abi/2009/476106.html</link><description>A next step to interpret the findings generated by genome-wide association studies is to associate molecular quantitative traits with disease-associated alleles. To this end, researchers are linking disease
risk alleles with gene expression quantitative trait loci (eQTL). However, gene expression at the
mRNA level is only an intermediate trait and flow cytometry analysis can provide more downstream
and biologically valuable protein level information in multiple cell subsets simultaneously using freshly
obtained samples. Because the throughput of flow cytometry is currently limited, experiments may
need to span over several weeks or months to obtain a sufficient sample size to demonstrate genetic
association. Therefore, normalisation methods are needed to control for technical variability and compare
flow cytometry data over an extended period of time. We show how the use of normalising
fluorospheres improves the repeatability of a cell surface CD25-APC mean fluorescence intensity phenotype
on CD4+ memory T cells. We investigate two types of normalising beads: broad spectrum and
spectrum matched. Lastly, we propose two alternative normalisation procedures that are usable in the
absence of normalising beads.</description><Author>Calliope A. Dendrou, Erik Fung, Laura Esposito, John A. Todd, Linda S. Wicker, and Vincent Plagnol</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Assessing the Quality of Whole Genome Alignments in Bacteria</title><link>http://www.hindawi.com/journals/abi/2009/749027.html</link><description>Comparing genomes is an essential preliminary step to solve many problems in
biology. Matching long similar segments between two genomes is a precondition for their evolutionary, genetic, and genome rearrangement analyses. Though various comparison methods have been developed in recent years, a quantitative assessment of their performance is lacking. Here, we describe two families of assessment measures whose purpose is to evaluate bacteria-oriented comparison tools. The first measure is based on how well the genome segmentation fits the gene annotation of the studied organisms; the second uses the number of segments created by the segmentation and the percentage of the two genomes that are conserved. The effectiveness of the two measures is demonstrated by applying them to the results of genome comparison tools obtained on 41 pairs of bacterial species. Despite the difference in the nature of the two types of measurements, both show consistent results, providing insights into the subtle differences between the mapping tools.</description><Author>Firas Swidan and Ron Shamir</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>iFlow: A Graphical User Interface for Flow Cytometry Tools in Bioconductor</title><link>http://www.hindawi.com/journals/abi/2009/103839.html</link><description>Flow cytometry (FCM) has become an important analysis technology in health care and medical research, but the large volume of data produced by modern high-throughput experiments has presented significant new challenges for computational analysis tools. The development of an FCM software suite in Bioconductor represents one approach to overcome these challenges. In the spirit of the R programming language (Tree Star Inc., &amp;#8220;FlowJo&amp;#8221;), these tools are predominantly console-driven, allowing for programmatic access and rapid development of novel algorithms. Using this software requires a solid understanding of programming concepts and of the R language. However, some of these tools|in particular the statistical graphics and novel analytical methods|are also useful for nonprogrammers. To this end, we have developed an open source, extensible graphical user interface (GUI) iFlow, which sits on top of the Bioconductor backbone, enabling basic analyses by means of convenient graphical menus and wizards. We envision iFlow to be easily extensible in order to quickly integrate novel methodological developments.</description><Author>Kyongryun Lee, Florian Hahne, Deepayan Sarkar, and Robert Gentleman</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Merging Mixture Components for Cell Population  Identification in Flow Cytometry</title><link>http://www.hindawi.com/journals/abi/2009/247646.html</link><description>We present a framework for the identification of cell subpopulations in 
flow cytometry data based on merging mixture components using the 
flowClust methodology. We show that the cluster merging algorithm 
under our framework improves model fit and provides a better 
estimate of the number of distinct cell subpopulations than 
either Gaussian mixture models or flowClust, especially for 
complicated flow cytometry data distributions. Our framework 
allows the automated selection of the number of distinct cell 
subpopulations and we are able to identify cases where the 
algorithm fails, thus making it suitable for application in a high 
throughput FCM analysis pipeline. Furthermore, we demonstrate a 
method for summarizing complex merged cell subpopulations in a 
simple manner that integrates with the existing flowClust 
framework and enables downstream data analysis. We demonstrate the 
performance of our framework on simulated and real FCM data. The 
software is available in the flowMerge package through the 
Bioconductor project.</description><Author>Greg Finak, Ali Bashashati, Ryan Brinkman, and Rapha&amp;#235;l Gottardo</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Analysis of High-Throughput Flow Cytometry Data Using plateCore</title><link>http://www.hindawi.com/journals/abi/2009/356141.html</link><description>Flow cytometry (FCM) software packages from R/Bioconductor, such as flowCore and flowViz, serve as an open platform for development of new analysis tools and methods. We created plateCore, a new package that extends the functionality in these core packages to enable automated negative control-based gating and make the processing and analysis of plate-based data sets from high-throughput FCM screening experiments easier. plateCore was used to analyze data from a BD FACS CAP  screening experiment where five Peripheral Blood Mononucleocyte Cell (PBMC) samples were assayed for 189 different human cell surface markers. This same data set was also manually analyzed by a cytometry expert using the FlowJo data analysis software package (TreeStar, USA). We show that the expression values for markers characterized using the automated approach in plateCore are in good agreement with those from FlowJo, and that using plateCore allows for more reproducible analyses of FCM screening data.</description><Author>Errol Strain, Florian Hahne, Ryan R. Brinkman, and Perry Haaland</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>The KM-Algorithm Identifies Regulated Genes in Time Series Expression Data</title><link>http://www.hindawi.com/journals/abi/2009/284251.html</link><description>We present a statistical method to rank observed genes in gene expression time series experiments according to their degree of regulation in a biological process. The ranking may be used to focus on specific genes or to select meaningful subsets of genes from which gene regulatory networks can be built. Our approach is based on a state space model that incorporates hidden regulators of gene expression. Kalman (K) smoothing and maximum (M) likelihood estimation techniques are used to derive optimal estimates of the model parameters upon which a proposed regulation criterion is based. The statistical power of the proposed algorithm is investigated, and a real data set is analyzed for the purpose of identifying regulated genes in time dependent gene expression data. This statistical approach supports the concept that meaningful biological conclusions can be drawn from gene expression time series experiments by focusing on strong regulation rather than large expression values.</description><Author>Martina Bremer and R. W. Doerge</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Bridging the Divide between Manual Gating and Bioinformatics with the Bioconductor Package flowFlowJo</title><link>http://www.hindawi.com/journals/abi/2009/809469.html</link><description>In flow cytometry, different cell types are usually selected or &amp;#8220;gated&amp;#8221; by a series of 1- or 2-dimensional geometric subsets of the measurements made on each cell. This is easily accomplished in commercial flow cytometry packages but it is difficult to work computationally with the results of this process. The ability to retrieve the results and work with both them and the raw data is critical; our experience points to the importance of bioinformatics tools that will allow us to examine gating robustness, combine manual and automated gating, and perform exploratory data analysis. To provide this capability, we have developed a Bioconductor package called flowFlowJo that can import gates defined by the commercial package FlowJo and work with them in a manner consistent with the other flow packages in Bioconductor. We present this package and illustrate some of the ways in which it can be used.</description><Author>John J. Gosink, Gary D. Means, William A. Rees, Cheng Su, and Hugh A. Rand</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>FlowFP: A Bioconductor Package for Fingerprinting Flow Cytometric Data</title><link>http://www.hindawi.com/journals/abi/2009/193947.html</link><description>A new software package called flowFP for the analysis of flow cytometry data is introduced.  The package, which is tightly integrated with other Bioconductor software for analysis of flow cytometry, provides tools to transform raw flow cytometry data into a form suitable for direct input into conventional statistical analysis and empirical modeling software tools. The approach of flowFP is to generate a description of the multivariate probability distribution function of flow cytometry data in the form of a &amp;#8220;fingerprint.&amp;#8221; As such, it is independent of a presumptive functional form for the distribution, in contrast with model-based methods such as Gaussian Mixture Modeling. FlowFP is computationally efficient and able to handle extremely large flow cytometry data sets of arbitrary dimensionality. Algorithms and software implementation of the package are described. Use of the software is exemplified with applications to data quality control and to the automated classification of Acute Myeloid Leukemia.</description><Author>Wade T. Rogers and Herbert A. Holyst</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>A Combinatory Approach for Selecting Prognostic Genes in Microarray Studies of Tumour Survivals</title><link>http://www.hindawi.com/journals/abi/2009/480486.html</link><description>Different from significant gene expression analysis which looks for genes that are differentially regulated, feature selection in the microarray-based prognostic gene expression analysis aims at finding a subset of marker genes that are not only differentially expressed but also informative for prediction. Unfortunately feature selection in literature of microarray study is predominated by the simple heuristic univariate gene filter paradigm that selects differentially expressed genes according to their statistical significances. We introduce a combinatory feature selection strategy that integrates differential gene expression analysis with the Gram-Schmidt process to identify prognostic genes that are both statistically significant and highly informative for predicting tumour survival outcomes. Empirical application to leukemia and ovarian cancer survival data through-within- and cross-study validations shows that the feature space can be largely reduced while achieving improved testing performances.</description><Author>Qihua Tan, Mads Thomassen, Kirsten M. Jochumsen, Ole Mogensen, Kaare Christensen, and Torben A. Kruse</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Tumor Classification Using High-Order Gene Expression Profiles Based on Multilinear ICA</title><link>http://www.hindawi.com/journals/abi/2009/926450.html</link><description>Motivation. Independent Components Analysis (ICA) maximizes the statistical independence of the representational components of a training gene expression profiles (GEP) ensemble, but it cannot distinguish relations between the different factors, or different modes, and it is not available to high-order GEP Data Mining. In order to generalize ICA, we introduce Multilinear-ICA and apply it to tumor classification using high order GEP. Firstly, we introduce the basis conceptions and operations of tensor and recommend Support Vector Machine (SVM) classifier and Multilinear-ICA. Secondly, the higher score genes of original high order GEP are selected by using t-statistics and tabulate tensors. Thirdly, the tensors are performed by Multilinear-ICA. Finally, the SVM is used to classify the tumor subtypes. Results. To show the validity of the proposed method, we apply it to tumor classification using high order GEP. Though we only use three datasets, the experimental results show that the method is effective and feasible. Through this survey, we hope to gain some insight into the problem of high order GEP tumor classification, in aid of further developing more effective tumor classification algorithms.</description><Author>Ming-gang Du, Shan-Wen Zhang, and Hong Wang</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>The FAST-AIMS Clinical Mass Spectrometry Analysis System</title><link>http://www.hindawi.com/journals/abi/2009/598241.html</link><description>Within clinical proteomics, mass spectrometry analysis of biological samples is emerging as an important high-throughput technology, capable of producing powerful diagnostic and prognostic models and identifying important disease biomarkers. As interest in this area grows, and the number of such proteomics datasets continues to increase, the need has developed for efficient, comprehensive, reproducible methods of mass spectrometry data analysis by both experts and nonexperts. We have designed and implemented a stand-alone software system, FAST-AIMS, which seeks to meet this need through automation of data preprocessing, feature selection, classification model generation, and performance estimation. FAST-AIMS is an efficient and user-friendly stand-alone software for predictive analysis of mass spectrometry data. The present resource review paper will describe the features and use of the FAST-AIMS system. The system is freely available for download for noncommercial use.</description><Author>Nafeh Fananapazir, Alexander Statnikov, and Constantin F. Aliferis</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>A Pathway Analysis Tool for Analyzing Microarray Data of Species with Low Physiological Information</title><link>http://www.hindawi.com/journals/abi/2008/719468.html</link><description>Pathway information provides insight into the biological processes underlying microarray data. Pathway information is widely available for humans and laboratory animals in databases through the internet, but less for other species, for example, livestock. Many software packages use species-specific gene IDs that cannot handle genomics data from other species. We developed a species-independent method to search pathways databases to analyse microarray data. Three PERL scripts were developed that use the names of the genes on the microarray. (1) Add synonyms of gene names by searching the Gene Ontology (GO) database. (2) Search the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database for pathway information using this GO-enriched gene list. (3) Combine the pathway data with the microarray data and visualize the results using color codes indicating regulation. To demonstrate the power of the method, we used a previously reported chicken microarray experiment investigating line-specific reactions to Salmonella infection as an example.</description><Author>M. F. W. te Pas, S. van Hemert, B. Hulsegge, A. J. W. Hoekman, M. H. Pool, J. M. J. Rebel, and M. A. Smits</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>NCR-PCOPGene: An Exploratory Tool for Analysis of Sample-Classes Effect on Gene-Expression Relationships</title><link>http://www.hindawi.com/journals/abi/2008/789026.html</link><description>Background. Microarray technology is so expensive and powerful that it is essential to extract maximum value from microarray data. Our tools allow researchers to test and formulate from a hypothesis to entire models. Results. The objective of the NCRPCOPGene is to study the relationships among gene expressions under different conditions, to classify these conditions, and to study their effect on the different relationships. The web application makes it easier to define the sample classes, grouping the microarray experiments either by using (a) biological, statistical, or any other previous knowledge or (b) their effect on the expression relationship maintained among specific genes of interest. By means of the type (a) class definition, the researcher can add biological information to the gene-expression relationships. The type (b) class definition allows for linking genes correlated neither linearly nor nonlinearly. Conclusions. The PCOPGene tools are especially suitable for microarrays with large sample series. This application helps to identify cellular states and the genes involved in it in a flexible way. The application takes advantage of the ability of our system to relate gene expressions; even when these relationships are noncontinuous and cannot be found using linear or nonlinear analytical methods.</description><Author>Juan Cedano, Mario Huerta, and Enrique Querol</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Metagenome Fragment Classification Using N-Mer  Frequency Profiles</title><link>http://www.hindawi.com/journals/abi/2008/205969.html</link><description>A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique  N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles  are used to train a naive Bayes classifier (NBC) that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLAST&amp;#39;s tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions). Cross-validation analysis demonstrates that species-accuracy achieves 90&amp;#37;  for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced.</description><Author>Gail Rosen, Elaine Garbarine, Diamantino Caseiro, Robi Polikar, and Bahrad Sokhansanj</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Genomic Promoter Analysis Predicts Functional Transcription Factor Binding</title><link>http://www.hindawi.com/journals/abi/2008/369830.html</link><description>Background. The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology. 
                  Results. 
                  We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software.  CONFAC identified 16296 human-mouse ortholog gene pairs, and of those pairs, 9107 genes contained conserved TFBS in the 3&amp;#x02009;kb proximal promoter and first intron. To attempt to predict in vivo occupancy of transcription factor binding sites, we developed a novel marginal effect isolator algorithm that builds upon Bayesian methods for multigroup TFBS filtering and predicted the in vivo occupancy of two transcription factors with an overall accuracy of 84&amp;#37;.   
                  Conclusion. Our analyses show that integration of chromatin immunoprecipitation data with conserved TFBS analysis can be used to generate accurate predictions of functional TFBS.  They also show that TFBS cooccurrence can be used to predict transcription factor binding to promoters in vivo.</description><Author>J. Sunil Rao, Suresh Karanam, Colleen D. McCabe, and Carlos S. Moreno</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Comparing Quantitative Trait Loci and Gene Expression Data</title><link>http://www.hindawi.com/journals/abi/2008/719818.html</link><description>We develop methods to compare the positions of quantitative trait loci (QTL) with a set of genes selected by other methods, such as microarray experiments, from a sequenced genome. We apply our methods to QTL for addictive behavior in mouse, and a set of genes upregulated in a region of the brain associated with addictive behavior, the nucleus accumbens (NA). The association between the QTL and NA genes is not significantly stronger than expected by chance. However, chromosomes 2 and 16 do show strong associations suggesting that genes on these chromosomes might be associated with addictive behavior. The statistical methodology developed for this study can be applied to similar studies to assess the mutual information in microarray and QTL analyses.</description><Author>Bing Han, Naomi S. Altman, Jessica A. Mong, Laura Cousino Klein, Donald W. Pfaff, and David J. Vandenbergh</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Genevestigator V3: A Reference Expression Database for the Meta-Analysis of Transcriptomes</title><link>http://www.hindawi.com/journals/abi/2008/420747.html</link><description>The Web-based software tool Genevestigator provides powerful tools for biologists to explore gene
expression across a wide variety of biological contexts. Its first releases, however, were limited by the scaling
ability of the system architecture, multiorganism data storage and analysis capability, and availability of
computationally intensive analysis methods. Genevestigator V3 is a novel meta-analysis system resulting
from new algorithmic and software development using a client/server architecture, large-scale manual
curation and quality control of microarray data for several organisms, and curation of pathway data for mouse
and Arabidopsis. In addition to improved querying features, Genevestigator V3 provides new tools to analyze
the expression of genes in many different contexts, to identify biomarker genes, to cluster genes into
expression modules, and to model expression responses in the context of metabolic and regulatory networks.
Being a reference expression database with user-friendly tools, Genevestigator V3 facilitates discovery
research and hypothesis validation.</description><Author>Tomas Hruz, Oliver Laule, Gabor Szabo, Frans Wessendorp, Stefan Bleuler, Lukas Oertle, Peter Widmayer, Wilhelm Gruissem, and Philip Zimmermann</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>A Tutorial of the Poisson Random Field Model in Population Genetics</title><link>http://www.hindawi.com/journals/abi/2008/257864.html</link><description>Population genetics is the study of allele frequency changes driven by various evolutionary forces such as mutation, natural selection, and random genetic drift.  Although natural selection is widely recognized as a bona-fide phenomenon, the extent to which it drives evolution continues to remain unclear and controversial. Various qualitative techniques, or so-called &amp;#8220;tests of neutrality&amp;#8221;, have been introduced to detect signatures of natural selection. A decade and a half ago, Stanley Sawyer and Daniel Hartl provided a mathematical framework, referred to as the Poisson random field (PRF), with which to determine quantitatively the intensity of selection on a particular gene or genomic region. The recent availability of large-scale genetic polymorphism data has sparked widespread interest in genome-wide investigations of natural selection. To that end, the original PRF model is of particular interest for geneticists and evolutionary genomicists. In this article, we will provide a tutorial of the mathematical derivation of the original Sawyer and Hartl PRF model.</description><Author>Praveen Sethupathy and Sridhar Hannenhalli</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item><item><title>Automated Quantitative Assessment of Proteins&amp;#39; Biological Function in Protein Knowledge Bases</title><link>http://www.hindawi.com/journals/abi/2008/897019.html</link><description>Primary protein sequence data are archived in databases together with information regarding corresponding biological functions. In this respect, UniProt/Swiss-Prot is currently the most comprehensive collection and it is routinely cross-examined when trying to unravel the biological role of hypothetical proteins. Bioscientists frequently extract single entries and further evaluate those on a subjective basis. In lieu of a standardized procedure for scoring the existing knowledge regarding individual proteins, we here report about a computer-assisted method, which we applied to score the present knowledge about any given Swiss-Prot entry.   Applying this quantitative score allows the comparison of proteins with respect to their sequence yet highlights the comprehension of functional data. pfs analysis may be also applied for quality control of individual entries or for database management in order to rank entry listings.</description><Author>Gabriele Mayr, G&amp;#252;nter Lepperdinger, and Peter Lackner</Author><copyright>&amp;#169; 2010, Hindawi Publishing Corporation. All rights reserved.</copyright></item></channel></rss>