The Scientific World Journal: Bioinformatics The latest articles from Hindawi Publishing Corporation © 2014 , Hindawi Publishing Corporation . All rights reserved. Molecular Phylogeny and Predicted 3D Structure of Plant beta-D--Acetylhexosaminidase Sun, 20 Jul 2014 07:34:30 +0000 beta-D--Acetylhexosaminidase, a family 20 glycosyl hydrolase, catalyzes the removal of -1,4-linked -acetylhexosamine residues from oligosaccharides and their conjugates. We constructed phylogenetic tree of -hexosaminidases to analyze the evolutionary history and predicted functions of plant hexosaminidases. Phylogenetic analysis reveals the complex history of evolution of plant -hexosaminidase that can be described by gene duplication events. The 3D structure of tomato -hexosaminidase (-Hex-Sl) was predicted by homology modeling using 1now as a template. Structural conformity studies of the best fit model showed that more than 98% of the residues lie inside the favoured and allowed regions where only 0.9% lie in the unfavourable region. Predicted 3D structure contains 531 amino acids residues with glycosyl hydrolase20b domain-I and glycosyl hydrolase20 superfamily domain-II including the (/)8 barrel in the central part. The and contents of the modeled structure were found to be 33.3% and 12.2%, respectively. Eleven amino acids were found to be involved in ligand-binding site; Asp(330) and Glu(331) could play important roles in enzyme-catalyzed reactions. The predicted model provides a structural framework that can act as a guide to develop a hypothesis for -Hex-Sl mutagenesis experiments for exploring the functions of this class of enzymes in plant kingdom. Md. Anowar Hossain and Hairul Azman Roslan Copyright © 2014 Md. Anowar Hossain and Hairul Azman Roslan. All rights reserved. Protein Binding Site Prediction by Combining Hidden Markov Support Vector Machine and Profile-Based Propensities Mon, 14 Jul 2014 00:00:00 +0000 Identification of protein binding sites is critical for studying the function of the proteins. In this paper, we proposed a method for protein binding site prediction, which combined the order profile propensities and hidden Markov support vector machine (HM-SVM). This method employed the sequential labeling technique to the field of protein binding site prediction. The input features of HM-SVM include the profile-based propensities, the Position-Specific Score Matrix (PSSM), and Accessible Surface Area (ASA). When tested on different data sets, the proposed method showed promising results, and outperformed some closely relative methods by more than 10% in terms of AUC. Bin Liu, Bingquan Liu, Fule Liu, and Xiaolong Wang Copyright © 2014 Bin Liu et al. All rights reserved. acACS: Improving the Prediction Accuracy of Protein Subcellular Locations and Protein Classification by Incorporating the Average Chemical Shifts Composition Wed, 02 Jul 2014 06:48:12 +0000 The chemical shift is sensitive to changes in the local environments and can report the structural changes. The structure information of a protein can be represented by the average chemical shifts (ACS) composition, which has been broadly applied for enhancing the prediction accuracy in protein subcellular locations and protein classification. However, different kinds of ACS composition can solve different problems. We established an online web server named acACS, which can convert secondary structure into average chemical shift and then compose the vector for representing a protein by using the algorithm of auto covariance. Our solution is easy to use and can meet the needs of users. Guo-Liang Fan, Yan-Ling Liu, Yong-Chun Zuo, Han-Xue Mei, Yi Rang, Bao-Yan Hou, and Yan Zhao Copyright © 2014 Guo-Liang Fan et al. All rights reserved. Prediction of Four Kinds of Simple Supersecondary Structures in Protein by Using Chemical Shifts Wed, 18 Jun 2014 09:12:09 +0000 Knowledge of supersecondary structures can provide important information about its spatial structure of protein. Some approaches have been developed for the prediction of protein supersecondary structure. However, the feature used by these approaches is primarily based on amino acid sequences. In this study, a novel model is presented to predict protein supersecondary structure by use of chemical shifts (CSs) information derived from nuclear magnetic resonance (NMR) spectroscopy. Using these CSs as inputs of the method of quadratic discriminant analysis (QD), we achieve the overall prediction accuracy of 77.3%, which is competitive with the same method for predicting supersecondary structures from amino acid compositions in threefold cross-validation. Moreover, our finding suggests that the combined use of different chemical shifts will influence the accuracy of prediction. Feng Yonge Copyright © 2014 Feng Yonge. All rights reserved. Nonlinear Quantitative Radiation Sensitivity Prediction Model Based on NCI-60 Cancer Cell Lines Tue, 17 Jun 2014 00:00:00 +0000 We proposed a nonlinear model to perform a novel quantitative radiation sensitivity prediction. We used the NCI-60 panel, which consists of nine different cancer types, as the platform to train our model. Important radiation therapy (RT) related genes were selected by significance analysis of microarrays (SAM). Orthogonal latent variables (LVs) were then extracted by the partial least squares (PLS) method as the new compressive input variables. Finally, support vector machine (SVM) regression model was trained with these LVs to predict the SF2 (the surviving fraction of cells after a radiation dose of 2 Gy -ray) values of the cell lines. Comparison with the published results showed significant improvement of the new method in various ways: (a) reducing the root mean square error (RMSE) of the radiation sensitivity prediction model from 0.20 to 0.011; and (b) improving prediction accuracy from 62% to 91%. To test the predictive performance of the gene signature, three different types of cancer patient datasets were used. Survival analysis across these different types of cancer patients strongly confirmed the clinical potential utility of the signature genes as a general prognosis platform. The gene regulatory network analysis identified six hub genes that are involved in canonical cancer pathways. Chunying Zhang, Luc Girard, Amit Das, Sun Chen, Guangqiang Zheng, and Kai Song Copyright © 2014 Chunying Zhang et al. All rights reserved. An Empirical Study of Different Approaches for Protein Classification Sun, 15 Jun 2014 12:03:31 +0000 Many domains would benefit from reliable and efficient systems for automatic protein classification. An area of particular interest in recent studies on automatic protein classification is the exploration of new methods for extracting features from a protein that work well for specific problems. These methods, however, are not generalizable and have proven useful in only a few domains. Our goal is to evaluate several feature extraction approaches for representing proteins by testing them across multiple datasets. Different types of protein representations are evaluated: those starting from the position specific scoring matrix of the proteins (PSSM), those derived from the amino-acid sequence, two matrix representations, and features taken from the 3D tertiary structure of the protein. We also test new variants of proteins descriptors. We develop our system experimentally by comparing and combining different descriptors taken from the protein representations. Each descriptor is used to train a separate support vector machine (SVM), and the results are combined by sum rule. Some stand-alone descriptors work well on some datasets but not on others. Through fusion, the different descriptors provide a performance that works well across all tested datasets, in some cases performing better than the state-of-the-art. Loris Nanni, Alessandra Lumini, and Sheryl Brahnam Copyright © 2014 Loris Nanni et al. All rights reserved. Study of Query Expansion Techniques and Their Application in the Biomedical Information Retrieval Sun, 02 Mar 2014 00:00:00 +0000 Information Retrieval focuses on finding documents whose content matches with a user query from a large document collection. As formulating well-designed queries is difficult for most users, it is necessary to use query expansion to retrieve relevant information. Query expansion techniques are widely applied for improving the efficiency of the textual information retrieval systems. These techniques help to overcome vocabulary mismatch issues by expanding the original query with additional relevant terms and reweighting the terms in the expanded query. In this paper, different text preprocessing and query expansion approaches are combined to improve the documents initially retrieved by a query in a scientific documental database. A corpus belonging to MEDLINE, called Cystic Fibrosis, is used as a knowledge source. Experimental results show that the proposed combinations of techniques greatly enhance the efficiency obtained by traditional queries. A. R. Rivas, E. L. Iglesias, and L. Borrajo Copyright © 2014 A. R. Rivas et al. All rights reserved. A Comparative Analysis of Synonymous Codon Usage Bias Pattern in Human Albumin Superfamily Thu, 20 Feb 2014 11:54:35 +0000 Synonymous codon usage bias is an inevitable phenomenon in organismic taxa across the three domains of life. Though the frequency of codon usage is not equal across species and within genome in the same species, the phenomenon is non random and is tissue-specific. Several factors such as GC content, nucleotide distribution, protein hydropathy, protein secondary structure, and translational selection are reported to contribute to codon usage preference. The synonymous codon usage patterns can be helpful in revealing the expression pattern of genes as well as the evolutionary relationship between the sequences. In this study, synonymous codon usage bias patterns were determined for the evolutionarily close proteins of albumin superfamily, namely, albumin, -fetoprotein, afamin, and vitamin D-binding protein. Our study demonstrated that the genes of the four albumin superfamily members have low GC content and high values of effective number of codons (ENC) suggesting high expressivity of these genes and less bias in codon usage preferences. This study also provided evidence that the albumin superfamily members are not subjected to mutational selection pressure. Hoda Mirsafian, Adiratna Mat Ripen, Aarti Singh, Phaik Hwan Teo, Amir Feisal Merican, and Saharuddin Bin Mohamad Copyright © 2014 Hoda Mirsafian et al. All rights reserved. Biomedical Informatics and Computational Biology for High-Throughput Data Analysis Wed, 12 Feb 2014 08:06:41 +0000 Bairong Shen, Jian Ma, Jiajun Wang, and Junbai Wang Copyright © 2014 Bairong Shen et al. All rights reserved. Tumor Necrosis Factor-α as a Diagnostic Marker for Neonatal Sepsis: A Meta-Analysis Tue, 11 Feb 2014 11:39:28 +0000 Neonatal sepsis (NS) is an important cause of mortality in newborns and life-threatening disorder in infants. The meta-analysis was performed to investigate the diagnosis value of tumor necrosis factor-α (TNF-α) test in NS. Our collectible studies were searched from PUBMED, EMBASE, and the Cochrane Library between March 1994 and August 2013. Accordingly, 347 studies were collected totally, in which 15 articles and 23 trials were selected to study the NS in our meta-analysis. The TNF-α test showed moderate accuracy of the diagnosis of NS both in early-onset neonatal sepsis (sensitivity = 0.66, specificity = 0.76, Q* = 0.74) and in late-onset neonatal sepsis (sensitivity = 0.68, specificity = 0.89, Q* = 0.87). We also found the northern hemisphere group in the test has higher sensitivity (0.84) and specificity (0.83). A diagnostic OR analysis found that the study population may be the major reason for the heterogeneity. Accordingly, we suggest that TNF-α is also a valuable marker in the diagnosis of NS. Bokun Lv, Jie Huang, Haining Yuan, Wenying Yan, Guang Hu, and Jian Wang Copyright © 2014 Bokun Lv et al. All rights reserved. A Neural-Network-Based Approach to White Blood Cell Classification Thu, 30 Jan 2014 00:00:00 +0000 This paper presents a new white blood cell classification system for the recognition of five types of white blood cells. We propose a new segmentation algorithm for the segmentation of white blood cells from smear images. The core idea of the proposed segmentation algorithm is to find a discriminating region of white blood cells on the HSI color space. Pixels with color lying in the discriminating region described by an ellipsoidal region will be regarded as the nucleus and granule of cytoplasm of a white blood cell. Then, through a further morphological process, we can segment a white blood cell from a smear image. Three kinds of features (i.e., geometrical features, color features, and LDP-based texture features) are extracted from the segmented cell. These features are fed into three different kinds of neural networks to recognize the types of the white blood cells. To test the effectiveness of the proposed white blood cell classification system, a total of 450 white blood cells images were used. The highest overall correct recognition rate could reach 99.11% correct. Simulation results showed that the proposed white blood cell classification system was very competitive to some existing systems. Mu-Chun Su, Chun-Yen Cheng, and Pa-Chun Wang Copyright © 2014 Mu-Chun Su et al. All rights reserved. Genome-Wide Characterisation of Gene Expression in Rice Leaf Blades at 25°C and 30°C Wed, 29 Jan 2014 09:39:07 +0000 Rice growth is greatly affected by temperature. To examine how temperature influences gene expression in rice on a genome-wide basis, we utilised recently compiled next-generation sequencing datasets and characterised a number of RNA-sequence transcriptome samples in rice seedling leaf blades at 25°C and 30°C. Our analysis indicated that 50.4% of all genes in the rice genome (28,296/56,143) were expressed in rice samples grown at 25°C, whereas slightly fewer genes (50.2%; 28,189/56,143) were expressed in rice leaf blades grown at 30°C. Among the genes that were expressed, approximately 3% were highly expressed, whereas approximately 65% had low levels of expression. Further examination demonstrated that 821 genes had a twofold or higher increase in expression and that 553 genes had a twofold or greater decrease in expression at 25°C. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses suggested that the ribosome pathway and multiple metabolic pathways were upregulated at 25°C. Based on these results, we deduced that gene expression at both transcriptional and translational levels was stimulated at 25°C, perhaps in response to a suboptimal temperature condition. Finally, we observed that temperature markedly regulates several super-families of transcription factors, including bZIP, MYB, and WRKY. Zhi-guo E, Lei Wang, Ryan Qin, Haihong Shen, and Jianhua Zhou Copyright © 2014 Zhi-guo E et al. All rights reserved. Ratsnake: A Versatile Image Annotation Tool with Application to Computer-Aided Diagnosis Mon, 27 Jan 2014 19:11:15 +0000 Image segmentation and annotation are key components of image-based medical computer-aided diagnosis (CAD) systems. In this paper we present Ratsnake, a publicly available generic image annotation tool providing annotation efficiency, semantic awareness, versatility, and extensibility, features that can be exploited to transform it into an effective CAD system. In order to demonstrate this unique capability, we present its novel application for the evaluation and quantification of salient objects and structures of interest in kidney biopsy images. Accurate annotation identifying and quantifying such structures in microscopy images can provide an estimation of pathogenesis in obstructive nephropathy, which is a rather common disease with severe implication in children and infants. However a tool for detecting and quantifying the disease is not yet available. A machine learning-based approach, which utilizes prior domain knowledge and textural image features, is considered for the generation of an image force field customizing the presented tool for automatic evaluation of kidney biopsy images. The experimental evaluation of the proposed application of Ratsnake demonstrates its efficiency and effectiveness and promises its wide applicability across a variety of medical imaging domains. D. K. Iakovidis, T. Goudas, C. Smailis, and I. Maglogiannis Copyright © 2014 D. K. Iakovidis et al. All rights reserved. Verification and Optimal Control of Context-Sensitive Probabilistic Boolean Networks Using Model Checking and Polynomial Optimization Thu, 23 Jan 2014 14:19:55 +0000 One of the significant topics in systems biology is to develop control theory of gene regulatory networks (GRNs). In typical control of GRNs, expression of some genes is inhibited (activated) by manipulating external stimuli and expression of other genes. It is expected to apply control theory of GRNs to gene therapy technologies in the future. In this paper, a control method using a Boolean network (BN) is studied. A BN is widely used as a model of GRNs, and gene expression is expressed by a binary value (ON or OFF). In particular, a context-sensitive probabilistic Boolean network (CS-PBN), which is one of the extended models of BNs, is used. For CS-PBNs, the verification problem and the optimal control problem are considered. For the verification problem, a solution method using the probabilistic model checker PRISM is proposed. For the optimal control problem, a solution method using polynomial optimization is proposed. Finally, a numerical example on the WNT5A network, which is related to melanoma, is presented. The proposed methods provide us useful tools in control theory of GRNs. Koichi Kobayashi and Kunihiko Hiraishi Copyright © 2014 Koichi Kobayashi and Kunihiko Hiraishi. All rights reserved. A Web-Server of Cell Type Discrimination System Wed, 22 Jan 2014 16:08:57 +0000 Discriminating cell types is a daily request for stem cell biologists. However, there is not a user-friendly system available to date for public users to discriminate the common cell types, embryonic stem cells (ESCs), induced pluripotent stem cells (iPSCs), and somatic cells (SCs). Here, we develop WCTDS, a web-server of cell type discrimination system, to discriminate the three cell types and their subtypes like fetal versus adult SCs. WCTDS is developed as a top layer application of our recent publication regarding cell type discriminations, which employs DNA-methylation as biomarkers and machine learning models to discriminate cell types. Implemented by Django, Python, R, and Linux shell programming, run under Linux-Apache web server, and communicated through MySQL, WCTDS provides a friendly framework to efficiently receive the user input and to run mathematical models for analyzing data and then to present results to users. This framework is flexible and easy to be expended for other applications. Therefore, WCTDS works as a user-friendly framework to discriminate cell types and subtypes and it can also be expended to detect other cell types like cancer cells. Anyou Wang, Yan Zhong, Yanhua Wang, and Qianchuan He Copyright © 2014 Anyou Wang et al. All rights reserved. Discovery of Novel Inhibitors for Nek6 Protein through Homology Model Assisted Structure Based Virtual Screening and Molecular Docking Approaches Wed, 22 Jan 2014 13:02:48 +0000 Nek6 is a member of the NIMA (never in mitosis, gene A)-related serine/threonine kinase family that plays an important role in the initiation of mitotic cell cycle progression. This work is an attempt to emphasize the structural and functional relationship of Nek6 protein based on homology modeling and binding pocket analysis. The three-dimensional structure of Nek6 was constructed by molecular modeling studies and the best model was further assessed by PROCHECK, ProSA, and ERRAT plot in order to analyze the quality and consistency of generated model. The overall quality of computed model showed 87.4% amino acid residues under the favored region. A 3 ns molecular dynamics simulation confirmed that the structure was reliable and stable. Two lead compounds (Binding database ID: 15666, 18602) were retrieved through structure-based virtual screening and induced fit docking approaches as novel Nek6 inhibitors. Hence, we concluded that the potential compounds may act as new leads for Nek6 inhibitors designing. P. Srinivasan, P. Chella Perumal, and A. Sudha Copyright © 2014 P. Srinivasan et al. All rights reserved. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering Wed, 22 Jan 2014 07:36:41 +0000 Cliques (maximal complete subnets) in protein-protein interaction (PPI) network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO) annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP) and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning. Lei Yang and Xianglong Tang Copyright © 2014 Lei Yang and Xianglong Tang. All rights reserved. RRHGE: A Novel Approach to Classify the Estrogen Receptor Based Breast Cancer Subtypes Sun, 19 Jan 2014 13:52:52 +0000 Background. Breast cancer is the most common type of cancer among females with a high mortality rate. It is essential to classify the estrogen receptor based breast cancer subtypes into correct subclasses, so that the right treatments can be applied to lower the mortality rate. Using gene signatures derived from gene interaction networks to classify breast cancers has proven to be more reproducible and can achieve higher classification performance. However, the interactions in the gene interaction network usually contain many false-positive interactions that do not have any biological meanings. Therefore, it is a challenge to incorporate the reliability assessment of interactions when deriving gene signatures from gene interaction networks. How to effectively extract gene signatures from available resources is critical to the success of cancer classification. Methods. We propose a novel method to measure and extract the reliable (biologically true or valid) interactions from gene interaction networks and incorporate the extracted reliable gene interactions into our proposed RRHGE algorithm to identify significant gene signatures from microarray gene expression data for classifying ER+ and ER− breast cancer samples. Results. The evaluation on real breast cancer samples showed that our RRHGE algorithm achieved higher classification accuracy than the existing approaches. Ashish Saini, Jingyu Hou, and Wanlei Zhou Copyright © 2014 Ashish Saini et al. All rights reserved. eFisioTrack: A Telerehabilitation Environment Based on Motion Recognition Using Accelerometry Sun, 12 Jan 2014 00:00:00 +0000 The growing demand for physical rehabilitation processes can result in the rising of costs and waiting lists, becoming a threat to healthcare services’ sustainability. Telerehabilitation solutions can help in this issue by discharging patients from points of care while improving their adherence to treatment. Sensing devices are used to collect data so that the physiotherapists can monitor and evaluate the patients’ activity in the scheduled sessions. This paper presents a software platform that aims to meet the needs of the rehabilitation experts and the patients along a physical rehabilitation plan, allowing its use in outpatient scenarios. It is meant to be low-cost and easy-to-use, improving patients and experts experience. We show the satisfactory results already obtained from its use, in terms of the accuracy evaluating the exercises, and the degree of users’ acceptance. We conclude that this platform is suitable and technically feasible to carry out rehabilitation plans outside the point of care. Daniel Ruiz-Fernandez, Oscar Marín-Alonso, Antonio Soriano-Paya, and Joaquin D. García-Pérez Copyright © 2014 Daniel Ruiz-Fernandez et al. All rights reserved. Prediction and Analysis of Surface Hydrophobic Residues in Tertiary Structure of Proteins Thu, 09 Jan 2014 12:51:05 +0000 The analysis of protein structures provides plenty of information about the factors governing the folding and stability of proteins, the preferred amino acids in the protein environment, the location of the residues in the interior/surface of a protein and so forth. In general, hydrophobic residues such as Val, Leu, Ile, Phe, and Met tend to be buried in the interior and polar side chains exposed to solvent. The present work depends on sequence as well as structural information of the protein and aims to understand nature of hydrophobic residues on the protein surfaces. It is based on the nonredundant data set of 218 monomeric proteins. Solvent accessibility of each protein was determined using NACCESS software and then obtained the homologous sequences to understand how well solvent exposed and buried hydrophobic residues are evolutionarily conserved and assigned the confidence scores to hydrophobic residues to be buried or solvent exposed based on the information obtained from conservation score and knowledge of flanking regions of hydrophobic residues. In the absence of a three-dimensional structure, the ability to predict surface accessibility of hydrophobic residues directly from the sequence is of great help in choosing the sites of chemical modification or specific mutations and in the studies of protein stability and molecular interactions. Shambhu Malleshappa Gowder, Jhinuk Chatterjee, Tanusree Chaudhuri, and Kusum Paul Copyright © 2014 Shambhu Malleshappa Gowder et al. All rights reserved. Large Scale Explorative Oligonucleotide Probe Selection for Thousands of Genetic Groups on a Computing Grid: Application to Phylogenetic Probe Design Using a Curated Small Subunit Ribosomal RNA Gene Database Mon, 06 Jan 2014 09:56:46 +0000 Phylogenetic Oligonucleotide Arrays (POAs) were recently adapted for studying the huge microbial communities in a flexible and easy-to-use way. POA coupled with the use of explorative probes to detect the unknown part is now one of the most powerful approaches for a better understanding of microbial community functioning. However, the selection of probes remains a very difficult task. The rapid growth of environmental databases has led to an exponential increase of data to be managed for an efficient design. Consequently, the use of high performance computing facilities is mandatory. In this paper, we present an efficient parallelization method to select known and explorative oligonucleotide probes at large scale using computing grids. We implemented a software that generates and monitors thousands of jobs over the European Computing Grid Infrastructure (EGI). We also developed a new algorithm for the construction of a high-quality curated phylogenetic database to avoid erroneous design due to bad sequence affiliation. We present here the performance and statistics of our method on real biological datasets based on a phylogenetic prokaryotic database at the genus level and a complete design of about 20,000 probes for 2,069 genera of prokaryotes. Faouzi Jaziri, Eric Peyretaillade, Mohieddine Missaoui, Nicolas Parisot, Sébastien Cipière, Jérémie Denonfoux, Antoine Mahul, Pierre Peyret, and David R. C. Hill Copyright © 2014 Faouzi Jaziri et al. All rights reserved. Nonlinear-Model-Based Analysis Methods for Time-Course Gene Expression Data Thu, 02 Jan 2014 15:48:10 +0000 Microarray technology has produced a huge body of time-course gene expression data and will continue to produce more. Such gene expression data has been proved useful in genomic disease diagnosis and drug design. The challenge is how to uncover useful information from such data by proper analysis methods such as significance analysis and clustering analysis. Many statistic-based significance analysis methods and distance/correlation-based clustering analysis methods have been applied to time-course expression data. However, these techniques are unable to account for the dynamics of such data. It is the dynamics that characterizes such data and that should be considered in analysis of such data. In this paper, we employ a nonlinear model to analyse time-course gene expression data. We firstly develop an efficient method for estimating the parameters in the nonlinear model. Then we utilize this model to perform the significance analysis of individually differentially expressed genes and clustering analysis of a set of gene expression profiles. The verification with two synthetic datasets shows that our developed significance analysis method and cluster analysis method outperform some existing methods. The application to one real-life biological dataset illustrates that the analysis results of our developed methods are in agreement with the existing results. Li-Ping Tian, Li-Zhi Liu, and Fang-Xiang Wu Copyright © 2014 Li-Ping Tian et al. All rights reserved. Adaptive Shooting Regularization Method for Survival Analysis Using Gene Expression Data Sun, 15 Dec 2013 13:32:09 +0000 A new adaptive shooting regularization method for variable selection based on the Cox’s proportional hazards mode being proposed. This adaptive shooting algorithm can be easily obtained by the optimization of a reweighed iterative series of penalties and a shooting strategy of penalty. Simulation results based on high dimensional artificial data show that the adaptive shooting regularization method can be more accurate for variable selection than Lasso and adaptive Lasso methods. The results from real gene expression dataset (DLBCL) also indicate that the regularization method performs competitively. Xiao-Ying Liu, Yong Liang, Zong-Ben Xu, Hai Zhang, and Kwong-Sak Leung Copyright © 2013 Xiao-Ying Liu et al. All rights reserved. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods Thu, 12 Dec 2013 11:04:28 +0000 Esophageal squamous cell cancer (ESCC) is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas. Chun-Wei Tung, Ming-Tsang Wu, Yu-Kuei Chen, Chun-Chieh Wu, Wei-Chung Chen, Hsien-Pin Li, Shah-Hwa Chou, Deng-Chyang Wu, and I-Chen Wu Copyright © 2013 Chun-Wei Tung et al. All rights reserved. Gene Structures, Classification, and Expression Models of the DREB Transcription Factor Subfamily in Populus trichocarpa Wed, 13 Nov 2013 18:11:28 +0000 We identified 75 dehydration-responsive element-binding (DREB) protein genes in Populus trichocarpa. We analyzed gene structures, phylogenies, domain duplications, genome localizations, and expression profiles. The phylogenic construction suggests that the PtrDREB gene subfamily can be classified broadly into six subtypes (DREB A-1 to A-6) in Populus. The chromosomal localizations of the PtrDREB genes indicated 18 segmental duplication events involving 36 genes and six redundant PtrDREB genes were involved in tandem duplication events. There were fewer introns in the PtrDREB subfamily. The motif composition of PtrDREB was highly conserved in the same subtype. We investigated expression profiles of this gene subfamily from different tissues and/or developmental stages. Sixteen genes present in the digital expression analysis had high levels of transcript accumulation. The microarray results suggest that 18 genes were upregulated. We further examined the stress responsiveness of 15 genes by qRT-PCR. A digital northern analysis showed that the PtrDREB17, 18, and 32 genes were highly induced in leaves under cold stress, and the same expression trends were shown by qRT-PCR. Taken together, these observations may lay the foundation for future functional analyses to unravel the biological roles of Populus’ DREB genes. Yunlin Chen, Jingli Yang, Zhanchao Wang, Haizhen Zhang, Xuliang Mao, and Chenghao Li Copyright © 2013 Yunlin Chen et al. All rights reserved. Molecular Dynamic Simulation to Explore the Molecular Basis of Btk-PH Domain Interaction with Ins(1,3,4,5)P4 Wed, 06 Nov 2013 14:31:40 +0000 Bruton’s tyrosine kinase contains a pleckstrin homology domain, and it specifically binds inositol 1,3,4,5-tetrakisphosphate (Ins(1,3,4,5)P4), which is involved in the maturation of B cells. In this paper, we studied 12 systems including the wild type and 11 mutants, K12R, S14F, K19E, R28C/H, E41K, L11P, F25S, Y40N, and K12R-R28C/H, to investigate any change in the ligand binding site of each mutant. Molecular dynamics simulations combined with the method of molecular mechanics/Poisson-Boltzmann solvent-accessible surface area have been applied to the twelve systems, and reasonable mutant structures and their binding free energies have been obtained as criteria in the final classification. As a result, five structures, K12R, K19E, R28C/H, and E41K mutants, were classified as “functional mutations,” whereas L11P, S14F, F25S, and Y40N were grouped into “folding mutations.” This rigorous study of the binding affinity of each of the mutants and their classification provides some new insights into the biological function of the Btk-PH domain and related mutation-causing diseases. Dan Lu, Junfeng Jiang, Zhongjie Liang, Maomin Sun, Cheng Luo, Bairong Shen, and Guang Hu Copyright © 2013 Dan Lu et al. All rights reserved. Application of Bioinformatics in Chronobiology Research Wed, 25 Sep 2013 17:39:04 +0000 Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through “omics” projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research. Robson da Silva Lopes, Nathalia Maria Resende, Adenilda Cristina Honorio-França, and Eduardo Luzía França Copyright © 2013 Robson da Silva Lopes et al. All rights reserved. Phylogenetic, Expression, and Bioinformatic Analysis of the ABC1 Gene Family in Populus trichocarpa Sun, 15 Sep 2013 17:48:10 +0000 We studied 17 ABC1 genes in Populus trichocarpa, all of which contained an ABC1 domain consisting of about 120 amino acid residues. Most of the ABC1 gene products were located in the mitochondria or chloroplasts. All had a conserved VAVK-like motif and a DFG motif. Phylogenetic analysis grouped the genes into three subgroups. In addition, the chromosomal locations of the genes on the 19 Populus chromosomes were determined. Gene structure was studied through exon/intron organization and the MEME motif finder, while heatmap was used to study the expression diversity using EST libraries. According to the heatmap, PtrABC1P14 was highlighted because of the high expression in tension wood which related to secondary cell wall formation and cellulose synthesis, thus making a contribution to follow-up experiment in wood formation. Promoter cis-element analysis indicated that almost all of the ABC1 genes contained one or two cis-elements related to ABA signal transduction pathway and drought stress. Quantitative real-time PCR was carried out to evaluate the expression of all of the genes under abiotic stress conditions (ABA, CdCl2, high temperature, high salinity, and drought); the results showed that some of the genes were affected by these stresses and confirmed the results of promoter cis-element analysis. Zhanchao Wang, Haizhen Zhang, Jingli Yang, Yunlin Chen, Xuemei Xu, Xuliang Mao, and Chenghao Li Copyright © 2013 Zhanchao Wang et al. All rights reserved. Bioinformatics and Biomedical Informatics Wed, 29 May 2013 17:14:06 +0000 Kayvan Najarian, Rachid Deriche, Mark A. Kon, and Nina S. T. Hirata Copyright © 2013 Kayvan Najarian et al. All rights reserved. A Hierarchical Method for Removal of Baseline Drift from Biomedical Signals: Application in ECG Analysis Mon, 20 May 2013 15:41:31 +0000 Noise can compromise the extraction of some fundamental and important features from biomedical signals and hence prohibit accurate analysis of these signals. Baseline wander in electrocardiogram (ECG) signals is one such example, which can be caused by factors such as respiration, variations in electrode impedance, and excessive body movements. Unless baseline wander is effectively removed, the accuracy of any feature extracted from the ECG, such as timing and duration of the ST-segment, is compromised. This paper approaches this filtering task from a novel standpoint by assuming that the ECG baseline wander comes from an independent and unknown source. The technique utilizes a hierarchical method including a blind source separation (BSS) step, in particular independent component analysis, to eliminate the effect of the baseline wander. We examine the specifics of the components causing the baseline wander and the factors that affect the separation process. Experimental results reveal the superiority of the proposed algorithm in removing the baseline wander. Yurong Luo, Rosalyn H. Hargraves, Ashwin Belle, Ou Bai, Xuguang Qi, Kevin R. Ward, Michael Paul Pfaffenberger, and Kayvan Najarian Copyright © 2013 Yurong Luo et al. All rights reserved. Extracting Physicochemical Features to Predict Protein Secondary Structure Tue, 14 May 2013 11:09:14 +0000 We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances. Yin-Fu Huang and Shu-Ying Chen Copyright © 2013 Yin-Fu Huang and Shu-Ying Chen. All rights reserved. Functional Implications of Local DNA Structures in Regulatory Motifs Tue, 14 May 2013 11:06:33 +0000 The three-dimensional structure of DNA has been proposed to be a major determinant for functional transcription factors (TFs) and DNA interaction. Here, we use hydroxyl radical cleavage pattern as a measure of local DNA structure. We compared the conservation between DNA sequence and structure in terms of information content and attempted to assess the functional implications of DNA structures in regulatory motifs. We used statistical methods to evaluate the structural divergence of substituting a single position within a binding site and applied them to a collection of putative regulatory motifs. The following are our major observations: (i) we observed more information in structural alignment than in the corresponding sequence alignment for most of the transcriptional factors; (ii) for each TF, majority of positions have more information in the structural alignment as compared to the sequence alignment; (iii) we further defined a DNA structural divergence score (SD score) for each wild-type and mutant pair that is distinguished by single-base mutation. The SD score for benign mutations is significantly lower than that of switch mutations. This indicates structural conservation is also important for TFBS to be functional and DNA structures will provide previously unappreciated information for TF to realize the binding specificity. Qian Xiang Copyright © 2013 Qian Xiang. All rights reserved. Discovering Weighted Patterns in Intron Sequences Using Self-Adaptive Harmony Search and Back-Propagation Algorithms Wed, 08 May 2013 11:23:36 +0000 A hybrid self-adaptive harmony search and back-propagation mining system was proposed to discover weighted patterns in human intron sequences. By testing the weights under a lazy nearest neighbor classifier, the numerical results revealed the significance of these weighted patterns. Comparing these weighted patterns with the popular intron consensus model, it is clear that the discovered weighted patterns make originally the ambiguous 5SS and 3SS header patterns more specific and concrete. Yin-Fu Huang, Chia-Ming Wang, and Sing-Wu Liou Copyright © 2013 Yin-Fu Huang et al. All rights reserved. Mortality Predicted Accuracy for Hepatocellular Carcinoma Patients with Hepatic Resection Using Artificial Neural Network Tue, 30 Apr 2013 08:15:04 +0000 The aim of this present study is firstly to compare significant predictors of mortality for hepatocellular carcinoma (HCC) patients undergoing resection between artificial neural network (ANN) and logistic regression (LR) models and secondly to evaluate the predictive accuracy of ANN and LR in different survival year estimation models. We constructed a prognostic model for 434 patients with 21 potential input variables by Cox regression model. Model performance was measured by numbers of significant predictors and predictive accuracy. The results indicated that ANN had double to triple numbers of significant predictors at 1-, 3-, and 5-year survival models as compared with LR models. Scores of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) of 1-, 3-, and 5-year survival estimation models using ANN were superior to those of LR in all the training sets and most of the validation sets. The study demonstrated that ANN not only had a great number of predictors of mortality variables but also provided accurate prediction, as compared with conventional methods. It is suggested that physicians consider using data mining methods as supplemental tools for clinical decision-making and prognostic evaluation. Herng-Chia Chiu, Te-Wei Ho, King-Teh Lee, Hong-Yaw Chen, and Wen-Hsien Ho Copyright © 2013 Herng-Chia Chiu et al. All rights reserved. Survival Analysis by Penalized Regression and Matrix Factorization Tue, 23 Apr 2013 09:34:57 +0000 Because every disease has its unique survival pattern, it is necessary to find a suitable model to simulate followups. DNA microarray is a useful technique to detect thousands of gene expressions at one time and is usually employed to classify different types of cancer. We propose combination methods of penalized regression models and nonnegative matrix factorization (NMF) for predicting survival. We tried - (lasso), - (ridge), and - combined (elastic net) penalized regression for diffuse large B-cell lymphoma (DLBCL) patients' microarray data and found that - combined method predicts survival best with the smallest logrank value. Furthermore, 80% of selected genes have been reported to correlate with carcinogenesis or lymphoma. Through NMF we found that DLBCL patients can be divided into 4 groups clearly, and it implies that DLBCL may have 4 subtypes which have a little different survival patterns. Next we excluded some patients who were indicated hard to classify in NMF and executed three penalized regression models again. We found that the performance of survival prediction has been improved with lower logrank values. Therefore, we conclude that after preselection of patients by NMF, penalized regression models can predict DLBCL patients' survival successfully. Yeuntyng Lai, Morihiro Hayashida, and Tatsuya Akutsu Copyright © 2013 Yeuntyng Lai et al. All rights reserved. Prediction of Associations between OMIM Diseases and MicroRNAs by Random Walk on OMIM Disease Similarity Network Wed, 20 Mar 2013 10:39:58 +0000 Increasing evidence has revealed that microRNAs (miRNAs) play important roles in the development and progression of human diseases. However, efforts made to uncover OMIM disease-miRNA associations are lacking and the majority of diseases in the OMIM database are not associated with any miRNA. Therefore, there is a strong incentive to develop computational methods to detect potential OMIM disease-miRNA associations. In this paper, random walk on OMIM disease similarity network is applied to predict potential OMIM disease-miRNA associations under the assumption that functionally related miRNAs are often associated with phenotypically similar diseases. Our method makes full use of global disease similarity values. We tested our method on 1226 known OMIM disease-miRNA associations in the framework of leave-one-out cross-validation and achieved an area under the ROC curve of 71.42%. Excellent performance enables us to predict a number of new potential OMIM disease-miRNA associations and the newly predicted associations are publicly released to facilitate future studies. Some predicted associations with high ranks were manually checked and were confirmed from the publicly available databases, which was a strong evidence for the practical relevance of our method. Hailin Chen and Zuping Zhang Copyright © 2013 Hailin Chen and Zuping Zhang. All rights reserved. A Comparative Genomic Study in Schizophrenic and in Bipolar Disorder Patients, Based on Microarray Expression Profiling Meta-Analysis Sun, 10 Mar 2013 10:07:12 +0000 Schizophrenia affecting almost 1% and bipolar disorder affecting almost 3%–5% of the global population constitute two severe mental disorders. The catecholaminergic and the serotonergic pathways have been proved to play an important role in the development of schizophrenia, bipolar disorder, and other related psychiatric disorders. The aim of the study was to perform and interpret the results of a comparative genomic profiling study in schizophrenic patients as well as in healthy controls and in patients with bipolar disorder and try to relate and integrate our results with an aberrant amino acid transport through cell membranes. In particular we have focused on genes and mechanisms involved in amino acid transport through cell membranes from whole genome expression profiling data. We performed bioinformatic analysis on raw data derived from four different published studies. In two studies postmortem samples from prefrontal cortices, derived from patients with bipolar disorder, schizophrenia, and control subjects, have been used. In another study we used samples from postmortem orbitofrontal cortex of bipolar subjects while the final study was performed based on raw data from a gene expression profiling dataset in the postmortem superior temporal cortex of schizophrenics. The data were downloaded from NCBI's GEO datasets. Marianthi Logotheti, Olga Papadodima, Nikolaos Venizelos, Aristotelis Chatziioannou, and Fragiskos Kolisis Copyright © 2013 Marianthi Logotheti et al. All rights reserved. NCK2 Is Significantly Associated with Opiates Addiction in African-Origin Men Thu, 28 Feb 2013 17:12:27 +0000 Substance dependence is a complex environmental and genetic disorder with significant social and medical concerns. Understanding the etiology of substance dependence is imperative to the development of effective treatment and prevention strategies. To this end, substantial effort has been made to identify genes underlying substance dependence, and in recent years, genome-wide association studies (GWASs) have led to discoveries of numerous genetic variants for complex diseases including substance dependence. Most of the GWAS discoveries were only based on single nucleotide polymorphisms (SNPs) and a single dichotomized outcome. By employing both SNP- and gene-based methods of analysis, we identified a strong (odds ratio = 13.87) and significant (P value = ) association of an SNP in the NCK2 gene on chromosome 2 with opiates addiction in African-origin men. Codependence analysis also identified a genome-wide significant association between NCK2 and comorbidity of substance dependence (P value = ) in African-origin men. Furthermore, we observed that the association between the NCK2 gene (P value = ) and opiates addiction reached the gene-based genome-wide significant level. In summary, our findings provided the first evidence for the involvement of NCK2 in the susceptibility to opiates addiction and further revealed the racial and gender specificities of its impact. Zhifa Liu, Xiaobo Guo, Yuan Jiang, and Heping Zhang Copyright © 2013 Zhifa Liu et al. All rights reserved. Biomedical Informatics for Computer-Aided Decision Support Systems: A Survey Mon, 04 Feb 2013 11:23:23 +0000 The volumes of current patient data as well as their complexity make clinical decision making more challenging than ever for physicians and other care givers. This situation calls for the use of biomedical informatics methods to process data and form recommendations and/or predictions to assist such decision makers. The design, implementation, and use of biomedical informatics systems in the form of computer-aided decision support have become essential and widely used over the last two decades. This paper provides a brief review of such systems, their application protocols and methodologies, and the future challenges and directions they suggest. Ashwin Belle, Mark A. Kon, and Kayvan Najarian Copyright © 2013 Ashwin Belle et al. All rights reserved. TOPPER: Topology Prediction of Transmembrane Protein Based on Evidential Reasoning Thu, 17 Jan 2013 09:12:31 +0000 The topology prediction of transmembrane protein is a hot research field in bioinformatics and molecular biology. It is a typical pattern recognition problem. Various prediction algorithms are developed to predict the transmembrane protein topology since the experimental techniques have been restricted by many stringent conditions. Usually, these individual prediction algorithms depend on various principles such as the hydrophobicity or charges of residues. In this paper, an evidential topology prediction method for transmembrane protein is proposed based on evidential reasoning, which is called TOPPER (topology prediction of transmembrane protein based on evidential reasoning). In the proposed method, the prediction results of multiple individual prediction algorithms can be transformed into BPAs (basic probability assignments) according to the confusion matrix. Then, the final prediction result can be obtained by the combination of each individual prediction base on Dempster’s rule of combination. The experimental results show that the proposed method is superior to the individual prediction algorithms, which illustrates the effectiveness of the proposed method. Xinyang Deng, Qi Liu, Yong Hu, and Yong Deng Copyright © 2013 Xinyang Deng et al. All rights reserved. Robust Microarray Meta-Analysis Identifies Differentially Expressed Genes for Clinical Prediction Tue, 18 Dec 2012 11:10:42 +0000 Combining multiple microarray datasets increases sample size and leads to improved reproducibility in identification of informative genes and subsequent clinical prediction. Although microarrays have increased the rate of genomic data collection, sample size is still a major issue when identifying informative genetic biomarkers. Because of this, feature selection methods often suffer from false discoveries, resulting in poorly performing predictive models. We develop a simple meta-analysis-based feature selection method that captures the knowledge in each individual dataset and combines the results using a simple rank average. In a comprehensive study that measures robustness in terms of clinical application (i.e., breast, renal, and pancreatic cancer), microarray platform heterogeneity, and classifier (i.e., logistic regression, diagonal LDA, and linear SVM), we compare the rank average meta-analysis method to five other meta-analysis methods. Results indicate that rank average meta-analysis consistently performs well compared to five other meta-analysis methods. John H. Phan, Andrew N. Young, and May D. Wang Copyright © 2012 John H. Phan et al. All rights reserved. Novel Computational Methodologies for Structural Modeling of Spacious Ligand Binding Sites of G-Protein-Coupled Receptors: Development and Application to Human Leukotriene B4 Receptor Mon, 10 Dec 2012 15:38:37 +0000 This paper describes a novel method to predict the activated structures of G-protein-coupled receptors (GPCRs) with high accuracy, while aiming for the use of the predicted 3D structures in in silico virtual screening in the future. We propose a new method for modeling GPCR thermal fluctuations, where conformation changes of the proteins are modeled by combining fluctuations on multiple time scales. The core idea of the method is that a molecular dynamics simulation is used to calculate average 3D coordinates of all atoms of a GPCR protein against heat fluctuation on the picosecond or nanosecond time scale, and then evolutionary computation including receptor-ligand docking simulations functions to determine the rotation angle of each helix of a GPCR protein as a movement on a longer time scale. The method was validated using human leukotriene B4 receptor BLT1 as a sample GPCR. Our study demonstrated that the proposed method was able to derive the appropriate 3D structure of the active-state GPCR which docks with its agonists. Yoko Ishino and Takanori Harada Copyright © 2012 Yoko Ishino and Takanori Harada. All rights reserved. Gene Expression Profiles for Predicting Metastasis in Breast Cancer: A Cross-Study Comparison of Classification Methods Wed, 28 Nov 2012 08:53:21 +0000 Machine learning has increasingly been used with microarray gene expression data and for the development of classifiers using a variety of methods. However, method comparisons in cross-study datasets are very scarce. This study compares the performance of seven classification methods and the effect of voting for predicting metastasis outcome in breast cancer patients, in three situations: within the same dataset or across datasets on similar or dissimilar microarray platforms. Combining classification results from seven classifiers into one voting decision performed significantly better during internal validation as well as external validation in similar microarray platforms than the underlying classification methods. When validating between different microarray platforms, random forest, another voting-based method, proved to be the best performing method. We conclude that voting based classifiers provided an advantage with respect to classifying metastasis outcome in breast cancer patients. Mark Burton, Mads Thomassen, Qihua Tan, and Torben A. Kruse Copyright © 2012 Mark Burton et al. All rights reserved. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency Mon, 10 Sep 2012 14:39:31 +0000 The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web. Inês Soares, Ana Goios, and António Amorim Copyright © 2012 Inês Soares et al. All rights reserved. A Learning-Based Approach for Biomedical Word Sense Disambiguation Tue, 01 May 2012 16:10:13 +0000 In the biomedical domain, word sense ambiguity is a widely spread problem with bioinformatics research effort devoted to it being not commensurate and allowing for more development. This paper presents and evaluates a learning-based approach for sense disambiguation within the biomedical domain. The main limitation with supervised methods is the need for a corpus of manually disambiguated instances of the ambiguous words. However, the advances in automatic text annotation and tagging techniques with the help of the plethora of knowledge sources like ontologies and text literature in the biomedical domain will help lessen this limitation. The proposed method utilizes the interaction model (mutual information) between the context words and the senses of the target word to induce reliable learning models for sense disambiguation. The method has been evaluated with the benchmark dataset NLM-WSD with various settings and in biomedical entity species disambiguation. The evaluation results showed that the approach is very competitive and outperforms recently reported results of other published techniques. Hisham Al-Mubaid and Sandeep Gungu Copyright © 2012 Hisham Al-Mubaid and Sandeep Gungu. All rights reserved. Ligand-Based Virtual Screening Using Bayesian Inference Network and Reweighted Fragments Tue, 01 May 2012 15:42:31 +0000 Many of the similarity-based virtual screening approaches assume that molecular fragments that are not related to the biological activity carry the same weight as the important ones. This was the reason that led to the use of Bayesian networks as an alternative to existing tools for similarity-based virtual screening. In our recent work, the retrieval performance of the Bayesian inference network (BIN) was observed to improve significantly when molecular fragments were reweighted using the relevance feedback information. In this paper, a set of active reference structures were used to reweight the fragments in the reference structure. In this approach, higher weights were assigned to those fragments that occur more frequently in the set of active reference structures while others were penalized. Simulated virtual screening experiments with MDL Drug Data Report datasets showed that the proposed approach significantly improved the retrieval effectiveness of ligand-based virtual screening, especially when the active molecules being sought had a high degree of structural heterogeneity. Ali Ahmed, Ammar Abdo, and Naomie Salim Copyright © 2012 Ali Ahmed et al. All rights reserved. Effects of Pooling Samples on the Performance of Classification Algorithms: A Comparative Study Mon, 30 Apr 2012 13:51:14 +0000 A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints. Kanthida Kusonmano, Michael Netzer, Christian Baumgartner, Matthias Dehmer, Klaus R. Liedl, and Armin Graber Copyright © 2012 Kanthida Kusonmano et al. All rights reserved. A Novel Partial Sequence Alignment Tool for Finding Large Deletions Sun, 01 Apr 2012 08:52:33 +0000 Finding large deletions in genome sequences has become increasingly more useful in bioinformatics, such as in clinical research and diagnosis. Although there are a number of publically available next generation sequencing mapping and sequence alignment programs, these software packages do not correctly align fragments containing deletions larger than one kb. We present a fast alignment software package, BinaryPartialAlign, that can be used by wet lab scientists to find long structural variations in their experiments. For BinaryPartialAlign, we make use of the Smith-Waterman (SW) algorithm with a binary-search-based approach for alignment with large gaps that we called partial alignment. BinaryPartialAlign implementation is compared with other straight-forward applications of SW. Simulation results on mtDNA fragments demonstrate the effectiveness (runtime and accuracy) of the proposed method. Taner Aruk, Duran Ustek, and Olcay Kursun Copyright © 2012 Taner Aruk et al. All rights reserved. Nonlinear Model-Based Method for Clustering Periodically Expressed Genes Tue, 01 Nov 2011 00:00:00 +0000 Clustering periodically expressed genes from their time-course expression data could help understand the molecular mechanism of those biological processes. In this paper, we propose a nonlinear model-based clustering method for periodically expressed gene profiles. As periodically expressed genes are associated with periodic biological processes, the proposed method naturally assumes that a periodically expressed gene dataset is generated by a number of periodical processes. Each periodical process is modelled by a linear combination of trigonometric sine and cosine functions in time plus a Gaussian noise term. A two stage method is proposed to estimate the model parameter, and a relocation-iteration algorithm is employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. One synthetic dataset and two biological datasets were employed to evaluate the performance of the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g., k-means) for periodically expressed gene data, and thus it is an effective cluster analysis method for periodically expressed gene data. Li-Ping Tian, Li-Zhi Liu, Qian-Wei Zhang, and Fang-Xiang Wu Copyright © 2011 Li-Ping Tian et al. All rights reserved. Stimulation of Apoptosis by Computationally Derived Small Molecules that Bind to BCL-2 Mon, 01 Jan 1900 00:00:00 +0000 Martha Mutomba, Jing Wang, Sergei Mailiartchouk, Tom Brady, Darryl Rideout, Christina Niemeyer, Hengyi Zhu, Cindy Fisher, Seymour Mong, and Kal Ramnarayan Copyright © 2001 Martha Mutomba et al. All rights reserved. Combining Bioinformatics and Biophysics to Understand Protein-Protein and Protein-Ligand Interactions Mon, 01 Jan 1900 00:00:00 +0000 The increasing numbers of proteins whose three-dimensional structures have been determined will have major impact on the ability to exploit genomic data. Sequence alignments will become more meaningful, protein structure prediction will become more accurate, and the prediction of protein function will become increasingly refined and precise. Such developments will require that sequence, structure, and physical chemical information be fully integrated and correlated with biological data in as much detail as possible. We have been developing a series of computational tools with the goal of detecting relationships among amino acid sequence, protein structure and protein function. In this context, recent computational advances in using structure to improve sequence alignments, in homology model building and in the calculation of binding affinities will be summarized as will their combined use, with specific application to understanding the principles of protein-protein and protein-ligand interactions. Barry Honig Copyright © 2002 Barry Honig. All rights reserved. Protocols for 16S rDNA Array Analyses of Microbial Communities by Sequence-Specific Labeling of DNA Probes Mon, 01 Jan 1900 00:00:00 +0000 Analyses of complex microbial communities are becoming increasingly important. Bottlenecks in these analyses, however, are the tools to actually describe the biodiversity. Novel protocols for DNA array-based analyses of microbial communities are presented. In these protocols, the specificity obtained by sequence-specific labeling of DNA probes is combined with the possibility of detecting several different probes simultaneously by DNA array hybridization. The gene encoding 16S ribosomal RNA was chosen as the target in these analyses. This gene contains both universally conserved regions and regions with relatively high variability. The universally conserved regions are used for PCR amplification primers, while the variable regions are used for the specific probes. Protocols are presented for DNA purification, probe construction, probe labeling, and DNA array hybridizations. Knut Rudi, Janneke Treimo, Hilde Nissen, and Gerd Vegarud Copyright © 2003 Knut Rudi et al. All rights reserved. Crystal Structures of Tcl1 Family Oncoproteins and Their Conserved Surface Features Mon, 01 Jan 1900 00:00:00 +0000 Members of the TCL1 family of oncogenes are abnormally expressed in mature T-cell leukemias and B-cell lymphomas. The proteins are involved in the coactivation of protein kinase B (Akt/PKB), a key intracellular kinase. The sequences and crystal structures of three Tcl1 proteins were analyzed in order to understand their interactions with Akt/PKB and the implications for lymphocyte malignancies. Tcl1 proteins are ~15 kD and share 25–80% amino acid sequence identity. The tertiary structures of mouse Tcl1, human Tcl1, and Mtcp1 are very similar. Analysis of the structures revealed conserved semi-planar surfaces that have characteristics of surfaces involved in protein-protein interactions. The Tcl1 proteins show differences in surface charge distribution and oligomeric state suggesting that they do not interact in the same way with Akt/PKB and other cellular protein(s). John M. Petock, Ivan Y. Torshin, Yuan-Fang Wang, Garrett C. Du Bois, Carlo M. Croce, Robert W. Harrison, and Irene T. Weber Copyright © 2002 John M. Petock et al. All rights reserved. Minimum Information About a Microarray Experiment (MIAME) – Successes, Failures, Challenges Mon, 01 Jan 1900 00:00:00 +0000 The Minimum Information About a Microarray Experiment (known as MIAME) guidelines describe information that needs to be provided to enable the interpretation of the results of a microarray-based experiment unambiguously. The MIAME guidelines were developed by the Microarray Gene Expression Data (MGED) Society. Since the MIAME position paper was published in 2001, it has been cited in the scientific literature well over a thousand times. MIAME has been replicated for many other technologies, the major data repositories are supporting MIAME, and most scientific journals have adopted MIAME guidelines as a requirement for publishing. With the advent of new-generation sequencing technology, MIAME faces new challenges. To address this, the MGED Society has proposed new guidelines, i.e., Minimum Information about a high-throughput SeQuencing Experiment (MINSEQE). Here we present analysis of the reasons for the success of MIAME, as well as discuss where it has failed, and the challenges it faces. Alvis Brazma Copyright © 2009 Alvis Brazma. All rights reserved. Delineating Novel Signature Patterns of Altered Gene Expression in Schizophrenia Using Gene Microarrays Mon, 01 Jan 1900 00:00:00 +0000 Schizophrenia is a complex and devastating brain disorder that affects 1% of the population and ranks as one of the most costly disorders to afflict humans. This disorder typically has its clinical onset in late adolescence or early adulthood, presenting as a constellation of delusions and hallucinations (positive symptoms); decreased motivation, emotional expression, and social interactions (negative symptoms); and impaired learning and memory (cognitive symptoms). The etiology of schizophrenia is unknown, but appears to be multifaceted, with genetic and epigenetic developmental factors all implicated. A convergence of observations from clinical, neuroimaging, and anatomical studies has implicated the dorsal prefrontal cortex as a major locus of alterations in schizophrenia. Karoly Mirnics, Frank A. Middleton, David A. Lewis, and Pat Levitt Copyright © 2001 Karoly Mirnics et al. All rights reserved. What Is Artificial about Life? Mon, 01 Jan 1900 00:00:00 +0000 The announcement of “Artificial Life” by the Craig Venter group, and the media stir that arose from the news, provoked thoughts about the current technologies in contemporary science and the cultural tension of such projections on the media. The increasingly blurred boundaries between specialist and generalist media, while promising a wider appreciation of scientific discovery, potentially allow unrealistic, ideological claims to dictate scientific research. This is particularly evident in biology, where the pervading paradigm is still dominated by a physically naïve reductionism in which the only relevant causative layer is the molecular one. The reductionist hypothesis is that everything one observes is the result of an underlying molecular mechanism almost independent of the context in which it operates. Molecular mechanisms are often necessarily studied in isolation and therefore operate in unnatural conditions. The mechanistic view of biological regulation implies that we think of genes as intelligent agents. Here we try to critically analyze the motivations behind the spread of such unrealistic simplifications. Alessandro Giuliani, Ignazio Licata, Carlo M. Modonesi, and Paolo Crosignani Copyright © 2011 Alessandro Giuliani et al. All rights reserved. Combining the Performance Strengths of the Logistic Regression and Neural Network Models: A Medical Outcomes Approach Mon, 01 Jan 1900 00:00:00 +0000 The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s) between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression) and machine learning (i.e., neural network) technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models. Wun Wong, Peter J. Fos, and Frederick E. Petry Copyright © 2003 Wun Wong et al. All rights reserved. Lipid Mediator Informatics and Proteomics in Inflammation-Resolution Mon, 01 Jan 1900 00:00:00 +0000 Lipid mediator informatics is an emerging area denoted to the identification of bioactive lipid mediators (LMs) and their biosynthetic profiles and pathways. LM informatics and proteomics applied to inflammation, systems tissues research provides a powerful means of uncovering key biomarkers for novel processes in health and disease. By incorporating them with system biology analysis, we review here our initial steps toward elucidating relationships among a range of bimolecular classes and provide an appreciation of their roles and activities in the pathophysiology of disease. LM informatics employing liquid chromatography-ultraviolet-tandem mass spectrometry (LC-UV-MS/MS), gas chromatography-mass spectrometry (GC-MS), computer-based automated systems equipped with databases and novel searching algorithms, and enzyme-linked immunosorbent assay (ELISA) to evaluate and profile temporal and spatial production of mediators combined with proteomics at defined points during experimental inflammation and its resolution enable us to identify novel mediators in resolution. The automated system including databases and searching algorithms is crucial for prompt and accurate analysis of these lipid mediators biosynthesized from precursor polyunsaturated fatty acids such as eicosanoids, resolvins, and neuroprotectins, which play key roles in human physiology and many prevalent diseases, especially those related to inflammation. This review presents detailed protocols used in our lab for LM informatics and proteomics using LC-UV-MS/MS, GC-MS, ELISA, novel databases and searching algorithms, and 2-dimensional gel electrophoresis and LC-nanospray-MS/MS peptide mapping. Yan Lu, Song Hong, Katherine Gotlinger, and Charles Serhan Copyright © 2006 Yan Lu et al. All rights reserved.