The Scientific World Journal: Bioinformatics The latest articles from Hindawi © 2017 , Hindawi Limited . All rights reserved. Pulse Diagnosis Signals Analysis of Fatty Liver Disease and Cirrhosis Patients by Using Machine Learning Sat, 28 Nov 2015 00:00:00 +0000 Objective. To compare the signals of pulse diagnosis of fatty liver disease (FLD) patients and cirrhosis patients. Methods. After collecting the pulse waves of patients with fatty liver disease, cirrhosis patients, and healthy volunteers, we do pretreatment and parameters extracting based on harmonic fitting, modeling, and identification by unsupervised learning Principal Component Analysis (PCA) and supervised learning Least squares Regression (LS) and Least Absolute Shrinkage and Selection Operator (LASSO) with cross-validation step by step for analysis. Results. There is significant difference between the pulse diagnosis signals of healthy volunteers and patients with FLD and cirrhosis, and the result was confirmed by 3 analysis methods. The identification accuracy of the 1st principal component is about 75% without any classification formation by PCA, and supervised learning’s accuracy (LS and LASSO) was even more than 93% when 7 parameters were used and was 84% when only 2 parameters were used. Conclusion. The method we built in this study based on the combination of unsupervised learning PCA and supervised learning LS and LASSO might offer some confidence for the realization of computer-aided diagnosis by pulse diagnosis in TCM. In addition, this study might offer some important evidence for the science of pulse diagnosis in TCM clinical diagnosis. Wang Nanyue, Yu Youhua, Huang Dawei, Xu Bin, Liu Jia, Li Tongda, Xue Liyuan, Shan Zengyu, Chen Yanping, and Wang Jia Copyright © 2015 Wang Nanyue et al. All rights reserved. Bioinformatics/Medical Informatics in Traditional Medicine and Integrative Medicine Wed, 04 Nov 2015 11:17:30 +0000 Zhaohui Liang, Xiangji Huang, Byeongsang Oh, and Josiah Poon Copyright © 2015 Zhaohui Liang et al. All rights reserved. An Ensemble Learning Based Framework for Traditional Chinese Medicine Data Analysis with ICD-10 Labels Thu, 01 Oct 2015 11:58:09 +0000 Objective. This study aims to establish a model to analyze clinical experience of TCM veteran doctors. We propose an ensemble learning based framework to analyze clinical records with ICD-10 labels information for effective diagnosis and acupoints recommendation. Methods. We propose an ensemble learning framework for the analysis task. A set of base learners composed of decision tree (DT) and support vector machine (SVM) are trained by bootstrapping the training dataset. The base learners are sorted by accuracy and diversity through nondominated sort (NDS) algorithm and combined through a deep ensemble learning strategy. Results. We evaluate the proposed method with comparison to two currently successful methods on a clinical diagnosis dataset with manually labeled ICD-10 information. ICD-10 label annotation and acupoints recommendation are evaluated for three methods. The proposed method achieves an accuracy rate of 88.2%  ±  2.8% measured by zero-one loss for the first evaluation session and 79.6%  ±  3.6% measured by Hamming loss, which are superior to the other two methods. Conclusion. The proposed ensemble model can effectively model the implied knowledge and experience in historic clinical data records. The computational cost of training a set of base learners is relatively low. Gang Zhang, Yonghui Huang, Ling Zhong, Shanxing Ou, Yi Zhang, and Ziping Li Copyright © 2015 Gang Zhang et al. All rights reserved. ISMAC: An Intelligent System for Customized Clinical Case Management and Analysis Thu, 01 Oct 2015 11:24:20 +0000 Clinical cases are primary and vital evidence for Traditional Chinese Medicine (TCM) clinical research. A great deal of medical knowledge is hidden in the clinical cases of the highly experienced TCM practitioner. With a deep Chinese culture background and years of clinical experience, an experienced TCM specialist usually has his or her unique clinical pattern and diagnosis idea. Preserving huge clinical cases of experienced TCM practitioners as well as exploring the inherent knowledge is then an important but arduous task. The novel system ISMAC (Intelligent System for Management and Analysis of Clinical Cases in TCM) is designed and implemented for customized management and intelligent analysis of TCM clinical data. Customized templates with standard and expert-standard symptoms, diseases, syndromes, and Chinese Medince Formula (CMF) are constructed in ISMAC, according to the clinical diagnosis and treatment characteristic of each TCM specialist. With these templates, clinical cases are archived in order to maintain their original characteristics. Varying data analysis and mining methods, grouped as Basic Analysis, Association Rule, Feature Reduction, Cluster, Pattern Classification, and Pattern Prediction, are implemented in the system. With a flexible dataset retrieval mechanism, ISMAC is a powerful and convenient system for clinical case analysis and clinical knowledge discovery. Mingyu You, Chong Chen, Guo-Zheng Li, Shi-Xing Yan, Sheng Sun, Xue-Qiang Zeng, Qing-Ce Zhao, Liao-Yu Xu, and Su-Ying Huang Copyright © 2015 Mingyu You et al. All rights reserved. Detecting Disease in Radiographs with Intuitive Confidence Thu, 01 Oct 2015 09:24:22 +0000 This paper argues in favor of a specific type of confidence for use in computer-aided diagnosis and disease classification, namely, sine/cosine values of angles represented by points on the unit circle. The paper shows how this confidence is motivated by Chinese medicine and how sine/cosine values are directly related with the two forces Yin and Yang. The angle for which sine and cosine are equal (45°) represents the state of equilibrium between Yin and Yang, which is a state of nonduality that indicates neither normality nor abnormality in terms of disease classification. The paper claims that the proposed confidence is intuitive and can be readily understood by physicians. The paper underpins this thesis with theoretical results in neural signal processing, stating that a sine/cosine relationship between the actual input signal and the perceived (learned) input is key to neural learning processes. As a practical example, the paper shows how to use the proposed confidence values to highlight manifestations of tuberculosis in frontal chest X-rays. Stefan Jaeger Copyright © 2015 Stefan Jaeger. All rights reserved. Patterns Exploration on Patterns of Empirical Herbal Formula of Chinese Medicine by Association Rules Thu, 01 Oct 2015 09:22:18 +0000 Background. In this study, we use association rules to explore the latent rules and patterns of prescribing and adjusting the ingredients of herbal decoctions based on empirical herbal formula of Chinese Medicine (CM). Materials and Methods. The consideration and development of CM prescriptions based on the knowledge of CM doctors are analyzed. The study contained three stages. The first stage is to identify the chief symptoms to a specific empirical herbal formula, which can serve as the key indication for herb addition and cancellation. The second stage is to conduct a case study on the empirical CM herbal formula for insomnia. Doctors will add extra ingredients or cancel some of them by CM syndrome diagnosis. The last stage of the study is to divide the observed cases into the effective group and ineffective group based on the assessed clinical effect by doctors. The patterns during the diagnosis and treatment are selected by the applied algorithm and the relations between clinical symptoms or indications and herb choosing principles will be selected by the association rules algorithm. Results. Totally 40 patients were observed in this study: 28 patients were considered effective after treatment and the remaining 12 were ineffective. 206 patterns related to clinical indications of Chinese Medicine were checked and screened with each observed case. In the analysis of the effective group, we used the algorithm of association rules to select combinations between 28 herbal adjustment strategies of the empirical herbal formula and the 190 patterns of individual clinical manifestations. During this stage, 11 common patterns were eliminated and 5 major symptoms for insomnia remained. 12 association rules were identified which included 5 herbal adjustment strategies. Conclusion. The association rules method is an effective algorithm to explore the latent relations between clinical indications and herbal adjustment strategies for the study on empirical herbal formulas. Li Huang, Jiamin Yuan, Zhimin Yang, Fuping Xu, and Chunhua Huang Copyright © 2015 Li Huang et al. All rights reserved. Acupuncture for Vascular Dementia: A Pragmatic Randomized Clinical Trial Thu, 01 Oct 2015 09:20:45 +0000 In this trial, patients who agreed to random assignment were allocated to a randomized acupuncture group (R-acupuncture group) or control group. Those who declined randomization were assigned to a nonrandomized acupuncture group (NR-acupuncture group). Patients in the R-acupuncture group and NR-acupuncture group received up to 21 acupuncture sessions during a period of 6 weeks plus routine care, while the control group received routine care alone. Cognitive function, activities of daily living, and quality of life were assessed by mini-mental state examination (MMSE), Activities of Daily Living Scale (ADL), and dementia quality of life questionnaire (DEMQOL), respectively. All the data were collected at baseline, after 6-week treatment, and after 4-week follow-up. No significant differences of MMSE scores were observed among the three groups but pooled-acupuncture group had significant higher score than control group. Compared to control group, ADL score significantly decreased in NR-acupuncture group and pooled-acupuncture group. For DEMQOL scores, no significant differences were observed among the three groups, as well as between pooled-acupuncture group and control group. Additional acupuncture to routine care may have beneficial effects on the improvements of cognitive status and activities of daily living but have limited efficacy on health-related quality of life in VaD patients. Guang-Xia Shi, Qian-Qian Li, Bo-Feng Yang, Yan Liu, Li-Ping Guan, Meng-Meng Wu, Lin-Peng Wang, and Cun-Zhi Liu Copyright © 2015 Guang-Xia Shi et al. All rights reserved. Syndrome Differentiation Analysis on Mars500 Data of Traditional Chinese Medicine Thu, 01 Oct 2015 08:27:53 +0000 Mars500 study was a psychological and physiological isolation experiment conducted by Russia, the European Space Agency, and China, in preparation for an unspecified future manned spaceflight to the planet Mars. Its intention was to yield valuable psychological and medical data on the effects of the planned long-term deep space mission. In this paper, we present data mining methods to mine medical data collected from the crew consisting of six spaceman volunteers. The synthesis of the four diagnostic methods of TCM, inspection, listening, inquiry, and palpation, is used in our syndrome differentiation. We adopt statistics method to describe the syndrome factor regular pattern of spaceman volunteers. Hybrid optimization based multilabel (HOML) is used as feature selection method and multilabel k-nearest neighbors (ML-KNN) is applied. According to the syndrome factor statistical result, we find that qi deficiency is a base syndrome pattern throughout the entire experiment process and, at the same time, there are different associated syndromes such as liver depression, spleen deficiency, dampness stagnancy, and yin deficiency, due to differences of individual situation. With feature selection, we screen out ten key factors which are essential to syndrome differentiation in TCM. The average precision of multilabel classification model reaches 80%. Yong-Zhi Li, Guo-Zheng Li, Jian-Yi Gao, Zhi-Feng Zhang, Quan-Chun Fan, Jia-Tuo Xu, Gui-E Bai, Kai-Xian Chen, Hong-Zhi Shi, Sheng Sun, Yu Liu, Feng-Feng Shao, Tao Mi, Xin-Hong Jia, Shuang Zhao, Jia-Chang Chen, Jun-Lian Liu, Yu-Meng Guo, and Li Ping Tu Copyright © 2015 Yong-Zhi Li et al. All rights reserved. Researches on Mathematical Relationship of Five Elements of Containing Notes and Fibonacci Sequence Modulo 5 Thu, 01 Oct 2015 08:17:24 +0000 Considering the five periods and six qi’s theory in TCM almost shares a common basis of stem-branch system with the five elements of containing notes, studying the principle or mathematical structure behind the five elements of containing notes can surely bring a novel view for the five periods and six qi’s researches. By analyzing typical mathematical rules included in He tu, Luo shu, and stem-branch theory in TCM as well as the Fibonacci sequence especially widely existent in the biological world, novel researches are performed on mathematical relationship between the five elements of containing notes and the Fibonacci sequence modulo 5. Enlightened by elementary Yin or Yang number grouping principle of He tu, Luo shu, the 12534 and 31542 key number series of Fibonacci sequence modulo 5 are obtained. And three new arrangements about the five elements of containing notes are then introduced, which have shown close relationship with the two obtained key subsequences of the Fibonacci sequence modulo 5. The novel discovery is quite helpful to recover the scientific secret of the five periods and six qi’s theory in TCM as well as that of whole traditional Chinese culture system, but more data is needed to elucidate the TCM theory further. Zhaoxue Chen Copyright © 2015 Zhaoxue Chen. All rights reserved. Standardization of Syndrome Differentiation Defined by Traditional Chinese Medicine in Operative Breast Cancer: A Modified Delphi Study Thu, 01 Oct 2015 06:49:20 +0000 Objective. The aim of this study was to establish the standardization of syndrome differentiation of operative breast cancer treated with Traditional Chinese Medicine (TCM) by the modified Delphi method. Method. A literature search for standardization of syndrome differentiation of operative breast cancer was conducted and eligible articles were identified in indexed databases from 1982 to 2013. We carried out two rounds of investigation between March and October 2013 and organized 20 experts who focused on TCM or integrative medicine in breast cancer research. Experts’ judgments were collected via posted questionnaires or e-mail. A final evaluation was carried out after the end of both rounds. Result. The response ratio of the 1st round investigation reached 100%, and two experts were excluded due to the uncompleted questionnaire. The 2nd round investigation was completed by 18 experts in the 1st round panel board. In both rounds, the experts agreed that the stage of breast cancer defined by TCM could be divided into the perioperation period, the perichemotherapy period, the periradiotherapy period, and the consolidation period. Conclusion. We identified the feasibility and reasonability to establish the standardization of syndrome differentiation of operative breast cancer. According to the suggestions from experts in our Delphi study, we preliminarily established the TCM standard of syndrome differentiation based on different treatment stages of operative breast cancer. Qianqian Guo and Qianjun Chen Copyright © 2015 Qianqian Guo and Qianjun Chen. All rights reserved. Using Bioinformatics Approach to Explore the Pharmacological Mechanisms of Multiple Ingredients in Shuang-Huang-Lian Thu, 01 Oct 2015 06:38:41 +0000 Due to the proved clinical efficacy, Shuang-Huang-Lian (SHL) has developed a variety of dosage forms. However, the in-depth research on targets and pharmacological mechanisms of SHL preparations was scarce. In the presented study, the bioinformatics approaches were adopted to integrate relevant data and biological information. As a result, a PPI network was built and the common topological parameters were characterized. The results suggested that the PPI network of SHL exhibited a scale-free property and modular architecture. The drug target network of SHL was structured with 21 functional modules. According to certain modules and pharmacological effects distribution, an antitumor effect and potential drug targets were predicted. A biological network which contained 26 subnetworks was constructed to elucidate the antipneumonia mechanism of SHL. We also extracted the subnetwork to explicitly display the pathway where one effective component acts on the pneumonia related targets. In conclusions, a bioinformatics approach was established for exploring the drug targets, pharmacological activity distribution, effective components of SHL, and its mechanism of antipneumonia. Above all, we identified the effective components and disclosed the mechanism of SHL from the view of system. Bai-xia Zhang, Jian Li, Hao Gu, Qiang Li, Qi Zhang, Tian-jiao Zhang, Yun Wang, and Cheng-ke Cai Copyright © 2015 Bai-xia Zhang et al. All rights reserved. Predicting Metabolic Syndrome Using the Random Forest Method Tue, 28 Jul 2015 12:20:04 +0000 Aims. This study proposes a computational method for determining the prevalence of metabolic syndrome (MS) and to predict its occurrence using the National Cholesterol Education Program Adult Treatment Panel III (NCEP ATP III) criteria. The Random Forest (RF) method is also applied to identify significant health parameters. Materials and Methods. We used data from 5,646 adults aged between 18–78 years residing in Bangkok who had received an annual health check-up in 2008. MS was identified using the NCEP ATP III criteria. The RF method was applied to predict the occurrence of MS and to identify important health parameters surrounding this disorder. Results. The overall prevalence of MS was 23.70% (34.32% for males and 17.74% for females). RF accuracy for predicting MS in an adult Thai population was 98.11%. Further, based on RF, triglyceride levels were the most important health parameter associated with MS. Conclusion. RF was shown to predict MS in an adult Thai population with an accuracy >98% and triglyceride levels were identified as the most informative variable associated with MS. Therefore, using RF to predict MS may be potentially beneficial in identifying MS status for preventing the development of diabetes mellitus and cardiovascular diseases. Apilak Worachartcheewan, Watshara Shoombuatong, Phannee Pidetcha, Wuttichai Nopnithipat, Virapong Prachayasittikul, and Chanin Nantasenamat Copyright © 2015 Apilak Worachartcheewan et al. All rights reserved. Application of Machine Learning Method in Genomics and Proteomics Sun, 19 Apr 2015 11:44:10 +0000 Hao Lin, Wei Chen, Ramu Anandakrishnan, and Dariusz Plewczynski Copyright © 2015 Hao Lin et al. All rights reserved. Briefing in Application of Machine Learning Methods in Ion Channel Prediction Thu, 16 Apr 2015 14:21:04 +0000 In cells, ion channels are one of the most important classes of membrane proteins which allow inorganic ions to move across the membrane. A wide range of biological processes are involved and regulated by the opening and closing of ion channels. Ion channels can be classified into numerous classes and different types of ion channels exhibit different functions. Thus, the correct identification of ion channels and their types using computational methods will provide in-depth insights into their function in various biological processes. In this review, we will briefly introduce and discuss the recent progress in ion channel prediction using machine learning methods. Hao Lin and Wei Chen Copyright © 2015 Hao Lin and Wei Chen. All rights reserved. In Silico Approach towards Designing Virtual Oligopeptides for HRSV Thu, 27 Nov 2014 00:10:14 +0000 HRSV (human respiratory syncytial virus) is a serious cause of lower respiratory tract illness in infants and young children. Designing inhibitors from the proteins involved in virus replication and infection process provides target for new therapeutic treatments. In the present study, in silico docking was performed using motavizumab as a template to design motavizumab derived oligopeptides for developing novel anti-HRSV agents. Additional simulations were conducted to study the conformational propensities of the oligopeptides and confirmed the hypothesis that the designed oligopeptide is highly flexible and capable of assuming stable confirmation. Our study demonstrated the best specific interaction of GEKKLVEAPKS oligopeptide for glycoprotein strain A among various screened oligopeptides. Encouraged by the results, we expect that the proposed scheme will provide rational choices for antibody reengineering which is useful for systematically identifying the possible ways to improve efficacy of existing antibody drugs. Ruchi Jain and Shanmughavel Piramanayagam Copyright © 2014 Ruchi Jain and Shanmughavel Piramanayagam. All rights reserved. Cheminformatics Models for Inhibitors of Schistosoma mansoni Thioredoxin Glutathione Reductase Tue, 25 Nov 2014 13:10:10 +0000 Schistosomiasis is a neglected tropical disease caused by a parasite Schistosoma mansoni and affects over 200 million annually. There is an urgent need to discover novel therapeutic options to control the disease with the recent emergence of drug resistance. The multifunctional protein, thioredoxin glutathione reductase (TGR), an essential enzyme for the survival of the pathogen in the redox environment has been actively explored as a potential drug target. The recent availability of small-molecule screening datasets against this target provides a unique opportunity to learn molecular properties and apply computational models for discovery of activities in large molecular libraries. Such a prioritisation approach could have the potential to reduce the cost of failures in lead discovery. A supervised learning approach was employed to develop a cost sensitive classification model to evaluate the biological activity of the molecules. Random forest was identified to be the best classifier among all the classifiers with an accuracy of around 80 percent. Independent analysis using a maximally occurring substructure analysis revealed 10 highly enriched scaffolds in the actives dataset and their docking against was also performed. We show that a combined approach of machine learning and other cheminformatics approaches such as substructure comparison and molecular docking is efficient to prioritise molecules from large molecular datasets. Sonam Gaba, Salma Jamal, Open Source Drug Discovery Consortium, and Vinod Scaria Copyright © 2014 Sonam Gaba et al. All rights reserved. Simulation of Contrast Agent Transport in Arteries with Multilayer Arterial Wall: Impact of Arterial Transmural Transport on the Bolus Delay and Dispersion Mon, 17 Nov 2014 12:45:33 +0000 One assumption of DSC-MRI is that the injected contrast agent is kept totally intravascular and the arterial wall is impermeable to contrast agent. The assumption is unreal for such small contrast agent as Gd-DTPA can leak into the arterial wall. To investigate whether the unreal assumption is valid for the estimation of the delay and dispersion of the contrast agent bolus, we simulated flow and Gd-DTPA transport in a model with multilayer arterial wall and analyzed the bolus delay and dispersion qualified by mean vascular transit time (MVTT) and the variance of the vascular transport function. Factors that may affect Gd-DTPA transport hence the delay and dispersion were further investigated, such as integrity of endothelium and disturbed flow. The results revealed that arterial transmural transport would slightly affect MVTT and moderately increase the variance. In addition, although the integrity of endothelium can significantly affect the accumulation of contrast agent in the arterial wall, it had small effects on the bolus delay and dispersion. However, the disturbed flow would significantly increase both MVTT and the variance. In conclusion, arterial transmural transport may have a small effect on the bolus delay and dispersion when compared to the flow pattern in the artery. Min Xu, Xiao Liu, Ang Li, Yubo Fan, Anqiang Sun, Xiaoyan Deng, and Deyu Li Copyright © 2014 Min Xu et al. All rights reserved. Supervised Wavelet Method to Predict Patient Survival from Gene Expression Data Mon, 03 Nov 2014 13:34:31 +0000 In microarray studies, the number of samples is relatively small compared to the number of genes per sample. An important aspect of microarray studies is the prediction of patient survival based on their gene expression profile. This naturally calls for the use of a dimension reduction procedure together with the survival prediction model. In this study, a new method based on combining wavelet approximation coefficients and Cox regression was presented. The proposed method was compared with supervised principal component and supervised partial least squares methods. The different fitted Cox models based on supervised wavelet approximation coefficients, the top number of supervised principal components, and partial least squares components were applied to the data. The results showed that the prediction performance of the Cox model based on supervised wavelet feature extraction was superior to the supervised principal components and partial least squares components. The results suggested the possibility of developing new tools based on wavelets for the dimensionally reduction of microarray data sets in the context of survival analysis. Maryam Farhadian, Paulo J. G. Lisboa, Abbas Moghimbeigi, Jalal Poorolajal, and Hossein Mahjub Copyright © 2014 Maryam Farhadian et al. All rights reserved. Exploration of Potential Roles of a New LOXL2 Splicing Variant Using Network Knowledge in Esophageal Squamous Cell Carcinoma Sun, 31 Aug 2014 06:42:06 +0000 LOXL2 (lysyl oxidase-like 2), an enzyme that catalyzes oxidative deamination of lysine residue, is upregulated in esophageal squamous cell carcinoma (ESCC). A LOXL2 splice variant LOXL2-e13 and its wild type were overexpressed in ESCC cells followed by microarray analyses. In this study, we explored the potential role and molecular mechanism of LOXL2-e13 based on known protein-protein interactions (PPIs), following microarray analysis of KYSE150 ESCC cells overexpressing a LOXL2 splice variant, denoted by LOXL2-e13, or its wild-type counterpart. The differentially expressed genes (DEGs) of LOXL2-WT and LOXL2-e13 were applied to generate individual PPI subnetworks in which hundreds of DEGs interacted with thousands of other proteins. These two DEG groups were annotated by Functional Annotation Chart analysis in the DAVID bioinformatics database and compared. These results found many specific annotations indicating the potential specific role or mechanism for LOXL2-e13. The DEGs of LOXL2-e13, comparing to its wild type, were prioritized by the Random Walk with Restart algorithm. Several tumor-related genes such as ERO1L, ITGA3, and MAPK8 were found closest to LOXL2-e13. These results provide helpful information for subsequent experimental identification of the specific biological roles and molecular mechanisms of LOXL2-e13. Our study also provides a work flow to identify potential roles of splice variants with large scale data. Bing-Li Wu, Guo-Qing Lv, Hai-Ying Zou, Ze-Peng Du, Jian-Yi Wu, Pi-Xian Zhang, Li-Yan Xu, and En-Min Li Copyright © 2014 Bing-Li Wu et al. All rights reserved. Amino Acid Sequence and Structural Comparison of BACE1 and BACE2 Using Evolutionary Trace Method Thu, 28 Aug 2014 05:59:58 +0000 Beta-amyloid precursor protein cleavage enzyme 1 (BACE1) and beta-amyloid precursor protein cleavage enzyme 2 (BACE2), members of aspartyl protease family, are close homologues and have high similarity in their protein crystal structures. However, their enzymatic properties differ leading to disparate clinical consequences. In order to identify the residues that are responsible for such differences, we used evolutionary trace (ET) method to compare the amino acid conservation patterns of BACE1 and BACE2 in several mammalian species. We found that, in BACE1 and BACE2 structures, most of the ligand binding sites are conserved which indicate their enzymatic property of aspartyl protease family members. The other conserved residues are more or less randomly localized in other parts of the structures. Four group-specific residues were identified at the ligand binding site of BACE1 and BACE2. We postulated that these residues would be essential for selectivity of BACE1 and BACE2 biological functions and could be sites of interest for the design of selective inhibitors targeting either BACE1 or BACE2. Hoda Mirsafian, Adiratna Mat Ripen, Amir Feisal Merican, and Saharuddin Bin Mohamad Copyright © 2014 Hoda Mirsafian et al. All rights reserved. Prediction of DNase I Hypersensitive Sites by Using Pseudo Nucleotide Compositions Tue, 19 Aug 2014 06:09:13 +0000 DNase I hypersensitive sites (DHS) associated with a wide variety of regulatory DNA elements. Knowledge about the locations of DHS is helpful for deciphering the function of noncoding genomic regions. With the acceleration of genome sequences in the postgenomic age, it is highly desired to develop cost-effective computational methods to identify DHS. In the present work, a support vector machine based model was proposed to identify DHS by using the pseudo dinucleotide composition. In the jackknife test, the proposed model obtained an accuracy of 83%, which is competitive with that of the existing method. This result suggests that the proposed model may become a useful tool for DHS identifications. Pengmian Feng, Ning Jiang, and Nan Liu Copyright © 2014 Pengmian Feng et al. All rights reserved. Database Constraints Applied to Metabolic Pathway Reconstruction Tools Sun, 17 Aug 2014 12:20:08 +0000 Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. Jordi Vilaplana, Francesc Solsona, Ivan Teixido, Anabel Usié, Hiren Karathia, Rui Alves, and Jordi Mateo Copyright © 2014 Jordi Vilaplana et al. All rights reserved. VibrioBase: A Model for Next-Generation Genome and Annotation Database Development Mon, 04 Aug 2014 05:16:46 +0000 To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC) tool, and pathogenomics profiling tool (PathoProT). The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development. Siew Woh Choo, Hamed Heydari, Tze King Tan, Cheuk Chuen Siow, Ching Yew Beh, Wei Yee Wee, Naresh V. R. Mutha, Guat Jah Wong, Mia Yang Ang, and Amir Hessam Yazdi Copyright © 2014 Siew Woh Choo et al. All rights reserved. Molecular Phylogeny and Predicted 3D Structure of Plant beta-D--Acetylhexosaminidase Sun, 20 Jul 2014 07:34:30 +0000 beta-D--Acetylhexosaminidase, a family 20 glycosyl hydrolase, catalyzes the removal of -1,4-linked -acetylhexosamine residues from oligosaccharides and their conjugates. We constructed phylogenetic tree of -hexosaminidases to analyze the evolutionary history and predicted functions of plant hexosaminidases. Phylogenetic analysis reveals the complex history of evolution of plant -hexosaminidase that can be described by gene duplication events. The 3D structure of tomato -hexosaminidase (-Hex-Sl) was predicted by homology modeling using 1now as a template. Structural conformity studies of the best fit model showed that more than 98% of the residues lie inside the favoured and allowed regions where only 0.9% lie in the unfavourable region. Predicted 3D structure contains 531 amino acids residues with glycosyl hydrolase20b domain-I and glycosyl hydrolase20 superfamily domain-II including the (/)8 barrel in the central part. The and contents of the modeled structure were found to be 33.3% and 12.2%, respectively. Eleven amino acids were found to be involved in ligand-binding site; Asp(330) and Glu(331) could play important roles in enzyme-catalyzed reactions. The predicted model provides a structural framework that can act as a guide to develop a hypothesis for -Hex-Sl mutagenesis experiments for exploring the functions of this class of enzymes in plant kingdom. Md. Anowar Hossain and Hairul Azman Roslan Copyright © 2014 Md. Anowar Hossain and Hairul Azman Roslan. All rights reserved. Protein Binding Site Prediction by Combining Hidden Markov Support Vector Machine and Profile-Based Propensities Mon, 14 Jul 2014 00:00:00 +0000 Identification of protein binding sites is critical for studying the function of the proteins. In this paper, we proposed a method for protein binding site prediction, which combined the order profile propensities and hidden Markov support vector machine (HM-SVM). This method employed the sequential labeling technique to the field of protein binding site prediction. The input features of HM-SVM include the profile-based propensities, the Position-Specific Score Matrix (PSSM), and Accessible Surface Area (ASA). When tested on different data sets, the proposed method showed promising results, and outperformed some closely relative methods by more than 10% in terms of AUC. Bin Liu, Bingquan Liu, Fule Liu, and Xiaolong Wang Copyright © 2014 Bin Liu et al. All rights reserved. acACS: Improving the Prediction Accuracy of Protein Subcellular Locations and Protein Classification by Incorporating the Average Chemical Shifts Composition Wed, 02 Jul 2014 06:48:12 +0000 The chemical shift is sensitive to changes in the local environments and can report the structural changes. The structure information of a protein can be represented by the average chemical shifts (ACS) composition, which has been broadly applied for enhancing the prediction accuracy in protein subcellular locations and protein classification. However, different kinds of ACS composition can solve different problems. We established an online web server named acACS, which can convert secondary structure into average chemical shift and then compose the vector for representing a protein by using the algorithm of auto covariance. Our solution is easy to use and can meet the needs of users. Guo-Liang Fan, Yan-Ling Liu, Yong-Chun Zuo, Han-Xue Mei, Yi Rang, Bao-Yan Hou, and Yan Zhao Copyright © 2014 Guo-Liang Fan et al. All rights reserved. Prediction of Four Kinds of Simple Supersecondary Structures in Protein by Using Chemical Shifts Wed, 18 Jun 2014 09:12:09 +0000 Knowledge of supersecondary structures can provide important information about its spatial structure of protein. Some approaches have been developed for the prediction of protein supersecondary structure. However, the feature used by these approaches is primarily based on amino acid sequences. In this study, a novel model is presented to predict protein supersecondary structure by use of chemical shifts (CSs) information derived from nuclear magnetic resonance (NMR) spectroscopy. Using these CSs as inputs of the method of quadratic discriminant analysis (QD), we achieve the overall prediction accuracy of 77.3%, which is competitive with the same method for predicting supersecondary structures from amino acid compositions in threefold cross-validation. Moreover, our finding suggests that the combined use of different chemical shifts will influence the accuracy of prediction. Feng Yonge Copyright © 2014 Feng Yonge. All rights reserved. Nonlinear Quantitative Radiation Sensitivity Prediction Model Based on NCI-60 Cancer Cell Lines Tue, 17 Jun 2014 00:00:00 +0000 We proposed a nonlinear model to perform a novel quantitative radiation sensitivity prediction. We used the NCI-60 panel, which consists of nine different cancer types, as the platform to train our model. Important radiation therapy (RT) related genes were selected by significance analysis of microarrays (SAM). Orthogonal latent variables (LVs) were then extracted by the partial least squares (PLS) method as the new compressive input variables. Finally, support vector machine (SVM) regression model was trained with these LVs to predict the SF2 (the surviving fraction of cells after a radiation dose of 2 Gy -ray) values of the cell lines. Comparison with the published results showed significant improvement of the new method in various ways: (a) reducing the root mean square error (RMSE) of the radiation sensitivity prediction model from 0.20 to 0.011; and (b) improving prediction accuracy from 62% to 91%. To test the predictive performance of the gene signature, three different types of cancer patient datasets were used. Survival analysis across these different types of cancer patients strongly confirmed the clinical potential utility of the signature genes as a general prognosis platform. The gene regulatory network analysis identified six hub genes that are involved in canonical cancer pathways. Chunying Zhang, Luc Girard, Amit Das, Sun Chen, Guangqiang Zheng, and Kai Song Copyright © 2014 Chunying Zhang et al. All rights reserved. An Empirical Study of Different Approaches for Protein Classification Sun, 15 Jun 2014 12:03:31 +0000 Many domains would benefit from reliable and efficient systems for automatic protein classification. An area of particular interest in recent studies on automatic protein classification is the exploration of new methods for extracting features from a protein that work well for specific problems. These methods, however, are not generalizable and have proven useful in only a few domains. Our goal is to evaluate several feature extraction approaches for representing proteins by testing them across multiple datasets. Different types of protein representations are evaluated: those starting from the position specific scoring matrix of the proteins (PSSM), those derived from the amino-acid sequence, two matrix representations, and features taken from the 3D tertiary structure of the protein. We also test new variants of proteins descriptors. We develop our system experimentally by comparing and combining different descriptors taken from the protein representations. Each descriptor is used to train a separate support vector machine (SVM), and the results are combined by sum rule. Some stand-alone descriptors work well on some datasets but not on others. Through fusion, the different descriptors provide a performance that works well across all tested datasets, in some cases performing better than the state-of-the-art. Loris Nanni, Alessandra Lumini, and Sheryl Brahnam Copyright © 2014 Loris Nanni et al. All rights reserved. Study of Query Expansion Techniques and Their Application in the Biomedical Information Retrieval Sun, 02 Mar 2014 00:00:00 +0000 Information Retrieval focuses on finding documents whose content matches with a user query from a large document collection. As formulating well-designed queries is difficult for most users, it is necessary to use query expansion to retrieve relevant information. Query expansion techniques are widely applied for improving the efficiency of the textual information retrieval systems. These techniques help to overcome vocabulary mismatch issues by expanding the original query with additional relevant terms and reweighting the terms in the expanded query. In this paper, different text preprocessing and query expansion approaches are combined to improve the documents initially retrieved by a query in a scientific documental database. A corpus belonging to MEDLINE, called Cystic Fibrosis, is used as a knowledge source. Experimental results show that the proposed combinations of techniques greatly enhance the efficiency obtained by traditional queries. A. R. Rivas, E. L. Iglesias, and L. Borrajo Copyright © 2014 A. R. Rivas et al. All rights reserved. A Comparative Analysis of Synonymous Codon Usage Bias Pattern in Human Albumin Superfamily Thu, 20 Feb 2014 11:54:35 +0000 Synonymous codon usage bias is an inevitable phenomenon in organismic taxa across the three domains of life. Though the frequency of codon usage is not equal across species and within genome in the same species, the phenomenon is non random and is tissue-specific. Several factors such as GC content, nucleotide distribution, protein hydropathy, protein secondary structure, and translational selection are reported to contribute to codon usage preference. The synonymous codon usage patterns can be helpful in revealing the expression pattern of genes as well as the evolutionary relationship between the sequences. In this study, synonymous codon usage bias patterns were determined for the evolutionarily close proteins of albumin superfamily, namely, albumin, -fetoprotein, afamin, and vitamin D-binding protein. Our study demonstrated that the genes of the four albumin superfamily members have low GC content and high values of effective number of codons (ENC) suggesting high expressivity of these genes and less bias in codon usage preferences. This study also provided evidence that the albumin superfamily members are not subjected to mutational selection pressure. Hoda Mirsafian, Adiratna Mat Ripen, Aarti Singh, Phaik Hwan Teo, Amir Feisal Merican, and Saharuddin Bin Mohamad Copyright © 2014 Hoda Mirsafian et al. All rights reserved. Biomedical Informatics and Computational Biology for High-Throughput Data Analysis Wed, 12 Feb 2014 08:06:41 +0000 Bairong Shen, Jian Ma, Jiajun Wang, and Junbai Wang Copyright © 2014 Bairong Shen et al. All rights reserved. Tumor Necrosis Factor-α as a Diagnostic Marker for Neonatal Sepsis: A Meta-Analysis Tue, 11 Feb 2014 11:39:28 +0000 Neonatal sepsis (NS) is an important cause of mortality in newborns and life-threatening disorder in infants. The meta-analysis was performed to investigate the diagnosis value of tumor necrosis factor-α (TNF-α) test in NS. Our collectible studies were searched from PUBMED, EMBASE, and the Cochrane Library between March 1994 and August 2013. Accordingly, 347 studies were collected totally, in which 15 articles and 23 trials were selected to study the NS in our meta-analysis. The TNF-α test showed moderate accuracy of the diagnosis of NS both in early-onset neonatal sepsis (sensitivity = 0.66, specificity = 0.76, Q* = 0.74) and in late-onset neonatal sepsis (sensitivity = 0.68, specificity = 0.89, Q* = 0.87). We also found the northern hemisphere group in the test has higher sensitivity (0.84) and specificity (0.83). A diagnostic OR analysis found that the study population may be the major reason for the heterogeneity. Accordingly, we suggest that TNF-α is also a valuable marker in the diagnosis of NS. Bokun Lv, Jie Huang, Haining Yuan, Wenying Yan, Guang Hu, and Jian Wang Copyright © 2014 Bokun Lv et al. All rights reserved. A Neural-Network-Based Approach to White Blood Cell Classification Thu, 30 Jan 2014 00:00:00 +0000 This paper presents a new white blood cell classification system for the recognition of five types of white blood cells. We propose a new segmentation algorithm for the segmentation of white blood cells from smear images. The core idea of the proposed segmentation algorithm is to find a discriminating region of white blood cells on the HSI color space. Pixels with color lying in the discriminating region described by an ellipsoidal region will be regarded as the nucleus and granule of cytoplasm of a white blood cell. Then, through a further morphological process, we can segment a white blood cell from a smear image. Three kinds of features (i.e., geometrical features, color features, and LDP-based texture features) are extracted from the segmented cell. These features are fed into three different kinds of neural networks to recognize the types of the white blood cells. To test the effectiveness of the proposed white blood cell classification system, a total of 450 white blood cells images were used. The highest overall correct recognition rate could reach 99.11% correct. Simulation results showed that the proposed white blood cell classification system was very competitive to some existing systems. Mu-Chun Su, Chun-Yen Cheng, and Pa-Chun Wang Copyright © 2014 Mu-Chun Su et al. All rights reserved. Genome-Wide Characterisation of Gene Expression in Rice Leaf Blades at 25°C and 30°C Wed, 29 Jan 2014 09:39:07 +0000 Rice growth is greatly affected by temperature. To examine how temperature influences gene expression in rice on a genome-wide basis, we utilised recently compiled next-generation sequencing datasets and characterised a number of RNA-sequence transcriptome samples in rice seedling leaf blades at 25°C and 30°C. Our analysis indicated that 50.4% of all genes in the rice genome (28,296/56,143) were expressed in rice samples grown at 25°C, whereas slightly fewer genes (50.2%; 28,189/56,143) were expressed in rice leaf blades grown at 30°C. Among the genes that were expressed, approximately 3% were highly expressed, whereas approximately 65% had low levels of expression. Further examination demonstrated that 821 genes had a twofold or higher increase in expression and that 553 genes had a twofold or greater decrease in expression at 25°C. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses suggested that the ribosome pathway and multiple metabolic pathways were upregulated at 25°C. Based on these results, we deduced that gene expression at both transcriptional and translational levels was stimulated at 25°C, perhaps in response to a suboptimal temperature condition. Finally, we observed that temperature markedly regulates several super-families of transcription factors, including bZIP, MYB, and WRKY. Zhi-guo E, Lei Wang, Ryan Qin, Haihong Shen, and Jianhua Zhou Copyright © 2014 Zhi-guo E et al. All rights reserved. Ratsnake: A Versatile Image Annotation Tool with Application to Computer-Aided Diagnosis Mon, 27 Jan 2014 19:11:15 +0000 Image segmentation and annotation are key components of image-based medical computer-aided diagnosis (CAD) systems. In this paper we present Ratsnake, a publicly available generic image annotation tool providing annotation efficiency, semantic awareness, versatility, and extensibility, features that can be exploited to transform it into an effective CAD system. In order to demonstrate this unique capability, we present its novel application for the evaluation and quantification of salient objects and structures of interest in kidney biopsy images. Accurate annotation identifying and quantifying such structures in microscopy images can provide an estimation of pathogenesis in obstructive nephropathy, which is a rather common disease with severe implication in children and infants. However a tool for detecting and quantifying the disease is not yet available. A machine learning-based approach, which utilizes prior domain knowledge and textural image features, is considered for the generation of an image force field customizing the presented tool for automatic evaluation of kidney biopsy images. The experimental evaluation of the proposed application of Ratsnake demonstrates its efficiency and effectiveness and promises its wide applicability across a variety of medical imaging domains. D. K. Iakovidis, T. Goudas, C. Smailis, and I. Maglogiannis Copyright © 2014 D. K. Iakovidis et al. All rights reserved. Verification and Optimal Control of Context-Sensitive Probabilistic Boolean Networks Using Model Checking and Polynomial Optimization Thu, 23 Jan 2014 14:19:55 +0000 One of the significant topics in systems biology is to develop control theory of gene regulatory networks (GRNs). In typical control of GRNs, expression of some genes is inhibited (activated) by manipulating external stimuli and expression of other genes. It is expected to apply control theory of GRNs to gene therapy technologies in the future. In this paper, a control method using a Boolean network (BN) is studied. A BN is widely used as a model of GRNs, and gene expression is expressed by a binary value (ON or OFF). In particular, a context-sensitive probabilistic Boolean network (CS-PBN), which is one of the extended models of BNs, is used. For CS-PBNs, the verification problem and the optimal control problem are considered. For the verification problem, a solution method using the probabilistic model checker PRISM is proposed. For the optimal control problem, a solution method using polynomial optimization is proposed. Finally, a numerical example on the WNT5A network, which is related to melanoma, is presented. The proposed methods provide us useful tools in control theory of GRNs. Koichi Kobayashi and Kunihiko Hiraishi Copyright © 2014 Koichi Kobayashi and Kunihiko Hiraishi. All rights reserved. A Web-Server of Cell Type Discrimination System Wed, 22 Jan 2014 16:08:57 +0000 Discriminating cell types is a daily request for stem cell biologists. However, there is not a user-friendly system available to date for public users to discriminate the common cell types, embryonic stem cells (ESCs), induced pluripotent stem cells (iPSCs), and somatic cells (SCs). Here, we develop WCTDS, a web-server of cell type discrimination system, to discriminate the three cell types and their subtypes like fetal versus adult SCs. WCTDS is developed as a top layer application of our recent publication regarding cell type discriminations, which employs DNA-methylation as biomarkers and machine learning models to discriminate cell types. Implemented by Django, Python, R, and Linux shell programming, run under Linux-Apache web server, and communicated through MySQL, WCTDS provides a friendly framework to efficiently receive the user input and to run mathematical models for analyzing data and then to present results to users. This framework is flexible and easy to be expended for other applications. Therefore, WCTDS works as a user-friendly framework to discriminate cell types and subtypes and it can also be expended to detect other cell types like cancer cells. Anyou Wang, Yan Zhong, Yanhua Wang, and Qianchuan He Copyright © 2014 Anyou Wang et al. All rights reserved. Discovery of Novel Inhibitors for Nek6 Protein through Homology Model Assisted Structure Based Virtual Screening and Molecular Docking Approaches Wed, 22 Jan 2014 13:02:48 +0000 Nek6 is a member of the NIMA (never in mitosis, gene A)-related serine/threonine kinase family that plays an important role in the initiation of mitotic cell cycle progression. This work is an attempt to emphasize the structural and functional relationship of Nek6 protein based on homology modeling and binding pocket analysis. The three-dimensional structure of Nek6 was constructed by molecular modeling studies and the best model was further assessed by PROCHECK, ProSA, and ERRAT plot in order to analyze the quality and consistency of generated model. The overall quality of computed model showed 87.4% amino acid residues under the favored region. A 3 ns molecular dynamics simulation confirmed that the structure was reliable and stable. Two lead compounds (Binding database ID: 15666, 18602) were retrieved through structure-based virtual screening and induced fit docking approaches as novel Nek6 inhibitors. Hence, we concluded that the potential compounds may act as new leads for Nek6 inhibitors designing. P. Srinivasan, P. Chella Perumal, and A. Sudha Copyright © 2014 P. Srinivasan et al. All rights reserved. Protein-Protein Interactions Prediction Based on Iterative Clique Extension with Gene Ontology Filtering Wed, 22 Jan 2014 07:36:41 +0000 Cliques (maximal complete subnets) in protein-protein interaction (PPI) network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO) annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP) and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning. Lei Yang and Xianglong Tang Copyright © 2014 Lei Yang and Xianglong Tang. All rights reserved. RRHGE: A Novel Approach to Classify the Estrogen Receptor Based Breast Cancer Subtypes Sun, 19 Jan 2014 13:52:52 +0000 Background. Breast cancer is the most common type of cancer among females with a high mortality rate. It is essential to classify the estrogen receptor based breast cancer subtypes into correct subclasses, so that the right treatments can be applied to lower the mortality rate. Using gene signatures derived from gene interaction networks to classify breast cancers has proven to be more reproducible and can achieve higher classification performance. However, the interactions in the gene interaction network usually contain many false-positive interactions that do not have any biological meanings. Therefore, it is a challenge to incorporate the reliability assessment of interactions when deriving gene signatures from gene interaction networks. How to effectively extract gene signatures from available resources is critical to the success of cancer classification. Methods. We propose a novel method to measure and extract the reliable (biologically true or valid) interactions from gene interaction networks and incorporate the extracted reliable gene interactions into our proposed RRHGE algorithm to identify significant gene signatures from microarray gene expression data for classifying ER+ and ER− breast cancer samples. Results. The evaluation on real breast cancer samples showed that our RRHGE algorithm achieved higher classification accuracy than the existing approaches. Ashish Saini, Jingyu Hou, and Wanlei Zhou Copyright © 2014 Ashish Saini et al. All rights reserved. eFisioTrack: A Telerehabilitation Environment Based on Motion Recognition Using Accelerometry Sun, 12 Jan 2014 00:00:00 +0000 The growing demand for physical rehabilitation processes can result in the rising of costs and waiting lists, becoming a threat to healthcare services’ sustainability. Telerehabilitation solutions can help in this issue by discharging patients from points of care while improving their adherence to treatment. Sensing devices are used to collect data so that the physiotherapists can monitor and evaluate the patients’ activity in the scheduled sessions. This paper presents a software platform that aims to meet the needs of the rehabilitation experts and the patients along a physical rehabilitation plan, allowing its use in outpatient scenarios. It is meant to be low-cost and easy-to-use, improving patients and experts experience. We show the satisfactory results already obtained from its use, in terms of the accuracy evaluating the exercises, and the degree of users’ acceptance. We conclude that this platform is suitable and technically feasible to carry out rehabilitation plans outside the point of care. Daniel Ruiz-Fernandez, Oscar Marín-Alonso, Antonio Soriano-Paya, and Joaquin D. García-Pérez Copyright © 2014 Daniel Ruiz-Fernandez et al. All rights reserved. Prediction and Analysis of Surface Hydrophobic Residues in Tertiary Structure of Proteins Thu, 09 Jan 2014 12:51:05 +0000 The analysis of protein structures provides plenty of information about the factors governing the folding and stability of proteins, the preferred amino acids in the protein environment, the location of the residues in the interior/surface of a protein and so forth. In general, hydrophobic residues such as Val, Leu, Ile, Phe, and Met tend to be buried in the interior and polar side chains exposed to solvent. The present work depends on sequence as well as structural information of the protein and aims to understand nature of hydrophobic residues on the protein surfaces. It is based on the nonredundant data set of 218 monomeric proteins. Solvent accessibility of each protein was determined using NACCESS software and then obtained the homologous sequences to understand how well solvent exposed and buried hydrophobic residues are evolutionarily conserved and assigned the confidence scores to hydrophobic residues to be buried or solvent exposed based on the information obtained from conservation score and knowledge of flanking regions of hydrophobic residues. In the absence of a three-dimensional structure, the ability to predict surface accessibility of hydrophobic residues directly from the sequence is of great help in choosing the sites of chemical modification or specific mutations and in the studies of protein stability and molecular interactions. Shambhu Malleshappa Gowder, Jhinuk Chatterjee, Tanusree Chaudhuri, and Kusum Paul Copyright © 2014 Shambhu Malleshappa Gowder et al. All rights reserved. Large Scale Explorative Oligonucleotide Probe Selection for Thousands of Genetic Groups on a Computing Grid: Application to Phylogenetic Probe Design Using a Curated Small Subunit Ribosomal RNA Gene Database Mon, 06 Jan 2014 09:56:46 +0000 Phylogenetic Oligonucleotide Arrays (POAs) were recently adapted for studying the huge microbial communities in a flexible and easy-to-use way. POA coupled with the use of explorative probes to detect the unknown part is now one of the most powerful approaches for a better understanding of microbial community functioning. However, the selection of probes remains a very difficult task. The rapid growth of environmental databases has led to an exponential increase of data to be managed for an efficient design. Consequently, the use of high performance computing facilities is mandatory. In this paper, we present an efficient parallelization method to select known and explorative oligonucleotide probes at large scale using computing grids. We implemented a software that generates and monitors thousands of jobs over the European Computing Grid Infrastructure (EGI). We also developed a new algorithm for the construction of a high-quality curated phylogenetic database to avoid erroneous design due to bad sequence affiliation. We present here the performance and statistics of our method on real biological datasets based on a phylogenetic prokaryotic database at the genus level and a complete design of about 20,000 probes for 2,069 genera of prokaryotes. Faouzi Jaziri, Eric Peyretaillade, Mohieddine Missaoui, Nicolas Parisot, Sébastien Cipière, Jérémie Denonfoux, Antoine Mahul, Pierre Peyret, and David R. C. Hill Copyright © 2014 Faouzi Jaziri et al. All rights reserved. Nonlinear-Model-Based Analysis Methods for Time-Course Gene Expression Data Thu, 02 Jan 2014 15:48:10 +0000 Microarray technology has produced a huge body of time-course gene expression data and will continue to produce more. Such gene expression data has been proved useful in genomic disease diagnosis and drug design. The challenge is how to uncover useful information from such data by proper analysis methods such as significance analysis and clustering analysis. Many statistic-based significance analysis methods and distance/correlation-based clustering analysis methods have been applied to time-course expression data. However, these techniques are unable to account for the dynamics of such data. It is the dynamics that characterizes such data and that should be considered in analysis of such data. In this paper, we employ a nonlinear model to analyse time-course gene expression data. We firstly develop an efficient method for estimating the parameters in the nonlinear model. Then we utilize this model to perform the significance analysis of individually differentially expressed genes and clustering analysis of a set of gene expression profiles. The verification with two synthetic datasets shows that our developed significance analysis method and cluster analysis method outperform some existing methods. The application to one real-life biological dataset illustrates that the analysis results of our developed methods are in agreement with the existing results. Li-Ping Tian, Li-Zhi Liu, and Fang-Xiang Wu Copyright © 2014 Li-Ping Tian et al. All rights reserved. Adaptive Shooting Regularization Method for Survival Analysis Using Gene Expression Data Sun, 15 Dec 2013 13:32:09 +0000 A new adaptive shooting regularization method for variable selection based on the Cox’s proportional hazards mode being proposed. This adaptive shooting algorithm can be easily obtained by the optimization of a reweighed iterative series of penalties and a shooting strategy of penalty. Simulation results based on high dimensional artificial data show that the adaptive shooting regularization method can be more accurate for variable selection than Lasso and adaptive Lasso methods. The results from real gene expression dataset (DLBCL) also indicate that the regularization method performs competitively. Xiao-Ying Liu, Yong Liang, Zong-Ben Xu, Hai Zhang, and Kwong-Sak Leung Copyright © 2013 Xiao-Ying Liu et al. All rights reserved. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods Thu, 12 Dec 2013 11:04:28 +0000 Esophageal squamous cell cancer (ESCC) is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas. Chun-Wei Tung, Ming-Tsang Wu, Yu-Kuei Chen, Chun-Chieh Wu, Wei-Chung Chen, Hsien-Pin Li, Shah-Hwa Chou, Deng-Chyang Wu, and I-Chen Wu Copyright © 2013 Chun-Wei Tung et al. All rights reserved. Gene Structures, Classification, and Expression Models of the DREB Transcription Factor Subfamily in Populus trichocarpa Wed, 13 Nov 2013 18:11:28 +0000 We identified 75 dehydration-responsive element-binding (DREB) protein genes in Populus trichocarpa. We analyzed gene structures, phylogenies, domain duplications, genome localizations, and expression profiles. The phylogenic construction suggests that the PtrDREB gene subfamily can be classified broadly into six subtypes (DREB A-1 to A-6) in Populus. The chromosomal localizations of the PtrDREB genes indicated 18 segmental duplication events involving 36 genes and six redundant PtrDREB genes were involved in tandem duplication events. There were fewer introns in the PtrDREB subfamily. The motif composition of PtrDREB was highly conserved in the same subtype. We investigated expression profiles of this gene subfamily from different tissues and/or developmental stages. Sixteen genes present in the digital expression analysis had high levels of transcript accumulation. The microarray results suggest that 18 genes were upregulated. We further examined the stress responsiveness of 15 genes by qRT-PCR. A digital northern analysis showed that the PtrDREB17, 18, and 32 genes were highly induced in leaves under cold stress, and the same expression trends were shown by qRT-PCR. Taken together, these observations may lay the foundation for future functional analyses to unravel the biological roles of Populus’ DREB genes. Yunlin Chen, Jingli Yang, Zhanchao Wang, Haizhen Zhang, Xuliang Mao, and Chenghao Li Copyright © 2013 Yunlin Chen et al. All rights reserved. Molecular Dynamic Simulation to Explore the Molecular Basis of Btk-PH Domain Interaction with Ins(1,3,4,5)P4 Wed, 06 Nov 2013 14:31:40 +0000 Bruton’s tyrosine kinase contains a pleckstrin homology domain, and it specifically binds inositol 1,3,4,5-tetrakisphosphate (Ins(1,3,4,5)P4), which is involved in the maturation of B cells. In this paper, we studied 12 systems including the wild type and 11 mutants, K12R, S14F, K19E, R28C/H, E41K, L11P, F25S, Y40N, and K12R-R28C/H, to investigate any change in the ligand binding site of each mutant. Molecular dynamics simulations combined with the method of molecular mechanics/Poisson-Boltzmann solvent-accessible surface area have been applied to the twelve systems, and reasonable mutant structures and their binding free energies have been obtained as criteria in the final classification. As a result, five structures, K12R, K19E, R28C/H, and E41K mutants, were classified as “functional mutations,” whereas L11P, S14F, F25S, and Y40N were grouped into “folding mutations.” This rigorous study of the binding affinity of each of the mutants and their classification provides some new insights into the biological function of the Btk-PH domain and related mutation-causing diseases. Dan Lu, Junfeng Jiang, Zhongjie Liang, Maomin Sun, Cheng Luo, Bairong Shen, and Guang Hu Copyright © 2013 Dan Lu et al. All rights reserved. Application of Bioinformatics in Chronobiology Research Wed, 25 Sep 2013 17:39:04 +0000 Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through “omics” projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research. Robson da Silva Lopes, Nathalia Maria Resende, Adenilda Cristina Honorio-França, and Eduardo Luzía França Copyright © 2013 Robson da Silva Lopes et al. All rights reserved. Phylogenetic, Expression, and Bioinformatic Analysis of the ABC1 Gene Family in Populus trichocarpa Sun, 15 Sep 2013 17:48:10 +0000 We studied 17 ABC1 genes in Populus trichocarpa, all of which contained an ABC1 domain consisting of about 120 amino acid residues. Most of the ABC1 gene products were located in the mitochondria or chloroplasts. All had a conserved VAVK-like motif and a DFG motif. Phylogenetic analysis grouped the genes into three subgroups. In addition, the chromosomal locations of the genes on the 19 Populus chromosomes were determined. Gene structure was studied through exon/intron organization and the MEME motif finder, while heatmap was used to study the expression diversity using EST libraries. According to the heatmap, PtrABC1P14 was highlighted because of the high expression in tension wood which related to secondary cell wall formation and cellulose synthesis, thus making a contribution to follow-up experiment in wood formation. Promoter cis-element analysis indicated that almost all of the ABC1 genes contained one or two cis-elements related to ABA signal transduction pathway and drought stress. Quantitative real-time PCR was carried out to evaluate the expression of all of the genes under abiotic stress conditions (ABA, CdCl2, high temperature, high salinity, and drought); the results showed that some of the genes were affected by these stresses and confirmed the results of promoter cis-element analysis. Zhanchao Wang, Haizhen Zhang, Jingli Yang, Yunlin Chen, Xuemei Xu, Xuliang Mao, and Chenghao Li Copyright © 2013 Zhanchao Wang et al. All rights reserved. Bioinformatics and Biomedical Informatics Wed, 29 May 2013 17:14:06 +0000 Kayvan Najarian, Rachid Deriche, Mark A. Kon, and Nina S. T. Hirata Copyright © 2013 Kayvan Najarian et al. All rights reserved. A Hierarchical Method for Removal of Baseline Drift from Biomedical Signals: Application in ECG Analysis Mon, 20 May 2013 15:41:31 +0000 Noise can compromise the extraction of some fundamental and important features from biomedical signals and hence prohibit accurate analysis of these signals. Baseline wander in electrocardiogram (ECG) signals is one such example, which can be caused by factors such as respiration, variations in electrode impedance, and excessive body movements. Unless baseline wander is effectively removed, the accuracy of any feature extracted from the ECG, such as timing and duration of the ST-segment, is compromised. This paper approaches this filtering task from a novel standpoint by assuming that the ECG baseline wander comes from an independent and unknown source. The technique utilizes a hierarchical method including a blind source separation (BSS) step, in particular independent component analysis, to eliminate the effect of the baseline wander. We examine the specifics of the components causing the baseline wander and the factors that affect the separation process. Experimental results reveal the superiority of the proposed algorithm in removing the baseline wander. Yurong Luo, Rosalyn H. Hargraves, Ashwin Belle, Ou Bai, Xuguang Qi, Kevin R. Ward, Michael Paul Pfaffenberger, and Kayvan Najarian Copyright © 2013 Yurong Luo et al. All rights reserved. Extracting Physicochemical Features to Predict Protein Secondary Structure Tue, 14 May 2013 11:09:14 +0000 We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances. Yin-Fu Huang and Shu-Ying Chen Copyright © 2013 Yin-Fu Huang and Shu-Ying Chen. All rights reserved. Functional Implications of Local DNA Structures in Regulatory Motifs Tue, 14 May 2013 11:06:33 +0000 The three-dimensional structure of DNA has been proposed to be a major determinant for functional transcription factors (TFs) and DNA interaction. Here, we use hydroxyl radical cleavage pattern as a measure of local DNA structure. We compared the conservation between DNA sequence and structure in terms of information content and attempted to assess the functional implications of DNA structures in regulatory motifs. We used statistical methods to evaluate the structural divergence of substituting a single position within a binding site and applied them to a collection of putative regulatory motifs. The following are our major observations: (i) we observed more information in structural alignment than in the corresponding sequence alignment for most of the transcriptional factors; (ii) for each TF, majority of positions have more information in the structural alignment as compared to the sequence alignment; (iii) we further defined a DNA structural divergence score (SD score) for each wild-type and mutant pair that is distinguished by single-base mutation. The SD score for benign mutations is significantly lower than that of switch mutations. This indicates structural conservation is also important for TFBS to be functional and DNA structures will provide previously unappreciated information for TF to realize the binding specificity. Qian Xiang Copyright © 2013 Qian Xiang. All rights reserved. Discovering Weighted Patterns in Intron Sequences Using Self-Adaptive Harmony Search and Back-Propagation Algorithms Wed, 08 May 2013 11:23:36 +0000 A hybrid self-adaptive harmony search and back-propagation mining system was proposed to discover weighted patterns in human intron sequences. By testing the weights under a lazy nearest neighbor classifier, the numerical results revealed the significance of these weighted patterns. Comparing these weighted patterns with the popular intron consensus model, it is clear that the discovered weighted patterns make originally the ambiguous 5SS and 3SS header patterns more specific and concrete. Yin-Fu Huang, Chia-Ming Wang, and Sing-Wu Liou Copyright © 2013 Yin-Fu Huang et al. All rights reserved. Mortality Predicted Accuracy for Hepatocellular Carcinoma Patients with Hepatic Resection Using Artificial Neural Network Tue, 30 Apr 2013 08:15:04 +0000 The aim of this present study is firstly to compare significant predictors of mortality for hepatocellular carcinoma (HCC) patients undergoing resection between artificial neural network (ANN) and logistic regression (LR) models and secondly to evaluate the predictive accuracy of ANN and LR in different survival year estimation models. We constructed a prognostic model for 434 patients with 21 potential input variables by Cox regression model. Model performance was measured by numbers of significant predictors and predictive accuracy. The results indicated that ANN had double to triple numbers of significant predictors at 1-, 3-, and 5-year survival models as compared with LR models. Scores of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) of 1-, 3-, and 5-year survival estimation models using ANN were superior to those of LR in all the training sets and most of the validation sets. The study demonstrated that ANN not only had a great number of predictors of mortality variables but also provided accurate prediction, as compared with conventional methods. It is suggested that physicians consider using data mining methods as supplemental tools for clinical decision-making and prognostic evaluation. Herng-Chia Chiu, Te-Wei Ho, King-Teh Lee, Hong-Yaw Chen, and Wen-Hsien Ho Copyright © 2013 Herng-Chia Chiu et al. All rights reserved. Survival Analysis by Penalized Regression and Matrix Factorization Tue, 23 Apr 2013 09:34:57 +0000 Because every disease has its unique survival pattern, it is necessary to find a suitable model to simulate followups. DNA microarray is a useful technique to detect thousands of gene expressions at one time and is usually employed to classify different types of cancer. We propose combination methods of penalized regression models and nonnegative matrix factorization (NMF) for predicting survival. We tried - (lasso), - (ridge), and - combined (elastic net) penalized regression for diffuse large B-cell lymphoma (DLBCL) patients' microarray data and found that - combined method predicts survival best with the smallest logrank value. Furthermore, 80% of selected genes have been reported to correlate with carcinogenesis or lymphoma. Through NMF we found that DLBCL patients can be divided into 4 groups clearly, and it implies that DLBCL may have 4 subtypes which have a little different survival patterns. Next we excluded some patients who were indicated hard to classify in NMF and executed three penalized regression models again. We found that the performance of survival prediction has been improved with lower logrank values. Therefore, we conclude that after preselection of patients by NMF, penalized regression models can predict DLBCL patients' survival successfully. Yeuntyng Lai, Morihiro Hayashida, and Tatsuya Akutsu Copyright © 2013 Yeuntyng Lai et al. All rights reserved. Prediction of Associations between OMIM Diseases and MicroRNAs by Random Walk on OMIM Disease Similarity Network Wed, 20 Mar 2013 10:39:58 +0000 Increasing evidence has revealed that microRNAs (miRNAs) play important roles in the development and progression of human diseases. However, efforts made to uncover OMIM disease-miRNA associations are lacking and the majority of diseases in the OMIM database are not associated with any miRNA. Therefore, there is a strong incentive to develop computational methods to detect potential OMIM disease-miRNA associations. In this paper, random walk on OMIM disease similarity network is applied to predict potential OMIM disease-miRNA associations under the assumption that functionally related miRNAs are often associated with phenotypically similar diseases. Our method makes full use of global disease similarity values. We tested our method on 1226 known OMIM disease-miRNA associations in the framework of leave-one-out cross-validation and achieved an area under the ROC curve of 71.42%. Excellent performance enables us to predict a number of new potential OMIM disease-miRNA associations and the newly predicted associations are publicly released to facilitate future studies. Some predicted associations with high ranks were manually checked and were confirmed from the publicly available databases, which was a strong evidence for the practical relevance of our method. Hailin Chen and Zuping Zhang Copyright © 2013 Hailin Chen and Zuping Zhang. All rights reserved. A Comparative Genomic Study in Schizophrenic and in Bipolar Disorder Patients, Based on Microarray Expression Profiling Meta-Analysis Sun, 10 Mar 2013 10:07:12 +0000 Schizophrenia affecting almost 1% and bipolar disorder affecting almost 3%–5% of the global population constitute two severe mental disorders. The catecholaminergic and the serotonergic pathways have been proved to play an important role in the development of schizophrenia, bipolar disorder, and other related psychiatric disorders. The aim of the study was to perform and interpret the results of a comparative genomic profiling study in schizophrenic patients as well as in healthy controls and in patients with bipolar disorder and try to relate and integrate our results with an aberrant amino acid transport through cell membranes. In particular we have focused on genes and mechanisms involved in amino acid transport through cell membranes from whole genome expression profiling data. We performed bioinformatic analysis on raw data derived from four different published studies. In two studies postmortem samples from prefrontal cortices, derived from patients with bipolar disorder, schizophrenia, and control subjects, have been used. In another study we used samples from postmortem orbitofrontal cortex of bipolar subjects while the final study was performed based on raw data from a gene expression profiling dataset in the postmortem superior temporal cortex of schizophrenics. The data were downloaded from NCBI's GEO datasets. Marianthi Logotheti, Olga Papadodima, Nikolaos Venizelos, Aristotelis Chatziioannou, and Fragiskos Kolisis Copyright © 2013 Marianthi Logotheti et al. All rights reserved. NCK2 Is Significantly Associated with Opiates Addiction in African-Origin Men Thu, 28 Feb 2013 17:12:27 +0000 Substance dependence is a complex environmental and genetic disorder with significant social and medical concerns. Understanding the etiology of substance dependence is imperative to the development of effective treatment and prevention strategies. To this end, substantial effort has been made to identify genes underlying substance dependence, and in recent years, genome-wide association studies (GWASs) have led to discoveries of numerous genetic variants for complex diseases including substance dependence. Most of the GWAS discoveries were only based on single nucleotide polymorphisms (SNPs) and a single dichotomized outcome. By employing both SNP- and gene-based methods of analysis, we identified a strong (odds ratio = 13.87) and significant (P value = ) association of an SNP in the NCK2 gene on chromosome 2 with opiates addiction in African-origin men. Codependence analysis also identified a genome-wide significant association between NCK2 and comorbidity of substance dependence (P value = ) in African-origin men. Furthermore, we observed that the association between the NCK2 gene (P value = ) and opiates addiction reached the gene-based genome-wide significant level. In summary, our findings provided the first evidence for the involvement of NCK2 in the susceptibility to opiates addiction and further revealed the racial and gender specificities of its impact. Zhifa Liu, Xiaobo Guo, Yuan Jiang, and Heping Zhang Copyright © 2013 Zhifa Liu et al. All rights reserved. Biomedical Informatics for Computer-Aided Decision Support Systems: A Survey Mon, 04 Feb 2013 11:23:23 +0000 The volumes of current patient data as well as their complexity make clinical decision making more challenging than ever for physicians and other care givers. This situation calls for the use of biomedical informatics methods to process data and form recommendations and/or predictions to assist such decision makers. The design, implementation, and use of biomedical informatics systems in the form of computer-aided decision support have become essential and widely used over the last two decades. This paper provides a brief review of such systems, their application protocols and methodologies, and the future challenges and directions they suggest. Ashwin Belle, Mark A. Kon, and Kayvan Najarian Copyright © 2013 Ashwin Belle et al. All rights reserved. TOPPER: Topology Prediction of Transmembrane Protein Based on Evidential Reasoning Thu, 17 Jan 2013 09:12:31 +0000 The topology prediction of transmembrane protein is a hot research field in bioinformatics and molecular biology. It is a typical pattern recognition problem. Various prediction algorithms are developed to predict the transmembrane protein topology since the experimental techniques have been restricted by many stringent conditions. Usually, these individual prediction algorithms depend on various principles such as the hydrophobicity or charges of residues. In this paper, an evidential topology prediction method for transmembrane protein is proposed based on evidential reasoning, which is called TOPPER (topology prediction of transmembrane protein based on evidential reasoning). In the proposed method, the prediction results of multiple individual prediction algorithms can be transformed into BPAs (basic probability assignments) according to the confusion matrix. Then, the final prediction result can be obtained by the combination of each individual prediction base on Dempster’s rule of combination. The experimental results show that the proposed method is superior to the individual prediction algorithms, which illustrates the effectiveness of the proposed method. Xinyang Deng, Qi Liu, Yong Hu, and Yong Deng Copyright © 2013 Xinyang Deng et al. All rights reserved. Robust Microarray Meta-Analysis Identifies Differentially Expressed Genes for Clinical Prediction Tue, 18 Dec 2012 11:10:42 +0000 Combining multiple microarray datasets increases sample size and leads to improved reproducibility in identification of informative genes and subsequent clinical prediction. Although microarrays have increased the rate of genomic data collection, sample size is still a major issue when identifying informative genetic biomarkers. Because of this, feature selection methods often suffer from false discoveries, resulting in poorly performing predictive models. We develop a simple meta-analysis-based feature selection method that captures the knowledge in each individual dataset and combines the results using a simple rank average. In a comprehensive study that measures robustness in terms of clinical application (i.e., breast, renal, and pancreatic cancer), microarray platform heterogeneity, and classifier (i.e., logistic regression, diagonal LDA, and linear SVM), we compare the rank average meta-analysis method to five other meta-analysis methods. Results indicate that rank average meta-analysis consistently performs well compared to five other meta-analysis methods. John H. Phan, Andrew N. Young, and May D. Wang Copyright © 2012 John H. Phan et al. All rights reserved. Novel Computational Methodologies for Structural Modeling of Spacious Ligand Binding Sites of G-Protein-Coupled Receptors: Development and Application to Human Leukotriene B4 Receptor Mon, 10 Dec 2012 15:38:37 +0000 This paper describes a novel method to predict the activated structures of G-protein-coupled receptors (GPCRs) with high accuracy, while aiming for the use of the predicted 3D structures in in silico virtual screening in the future. We propose a new method for modeling GPCR thermal fluctuations, where conformation changes of the proteins are modeled by combining fluctuations on multiple time scales. The core idea of the method is that a molecular dynamics simulation is used to calculate average 3D coordinates of all atoms of a GPCR protein against heat fluctuation on the picosecond or nanosecond time scale, and then evolutionary computation including receptor-ligand docking simulations functions to determine the rotation angle of each helix of a GPCR protein as a movement on a longer time scale. The method was validated using human leukotriene B4 receptor BLT1 as a sample GPCR. Our study demonstrated that the proposed method was able to derive the appropriate 3D structure of the active-state GPCR which docks with its agonists. Yoko Ishino and Takanori Harada Copyright © 2012 Yoko Ishino and Takanori Harada. All rights reserved. Gene Expression Profiles for Predicting Metastasis in Breast Cancer: A Cross-Study Comparison of Classification Methods Wed, 28 Nov 2012 08:53:21 +0000 Machine learning has increasingly been used with microarray gene expression data and for the development of classifiers using a variety of methods. However, method comparisons in cross-study datasets are very scarce. This study compares the performance of seven classification methods and the effect of voting for predicting metastasis outcome in breast cancer patients, in three situations: within the same dataset or across datasets on similar or dissimilar microarray platforms. Combining classification results from seven classifiers into one voting decision performed significantly better during internal validation as well as external validation in similar microarray platforms than the underlying classification methods. When validating between different microarray platforms, random forest, another voting-based method, proved to be the best performing method. We conclude that voting based classifiers provided an advantage with respect to classifying metastasis outcome in breast cancer patients. Mark Burton, Mads Thomassen, Qihua Tan, and Torben A. Kruse Copyright © 2012 Mark Burton et al. All rights reserved. Sequence Comparison Alignment-Free Approach Based on Suffix Tree and L-Words Frequency Mon, 10 Sep 2012 14:39:31 +0000 The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L—L-words—in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web. Inês Soares, Ana Goios, and António Amorim Copyright © 2012 Inês Soares et al. All rights reserved. A Learning-Based Approach for Biomedical Word Sense Disambiguation Tue, 01 May 2012 16:10:13 +0000 In the biomedical domain, word sense ambiguity is a widely spread problem with bioinformatics research effort devoted to it being not commensurate and allowing for more development. This paper presents and evaluates a learning-based approach for sense disambiguation within the biomedical domain. The main limitation with supervised methods is the need for a corpus of manually disambiguated instances of the ambiguous words. However, the advances in automatic text annotation and tagging techniques with the help of the plethora of knowledge sources like ontologies and text literature in the biomedical domain will help lessen this limitation. The proposed method utilizes the interaction model (mutual information) between the context words and the senses of the target word to induce reliable learning models for sense disambiguation. The method has been evaluated with the benchmark dataset NLM-WSD with various settings and in biomedical entity species disambiguation. The evaluation results showed that the approach is very competitive and outperforms recently reported results of other published techniques. Hisham Al-Mubaid and Sandeep Gungu Copyright © 2012 Hisham Al-Mubaid and Sandeep Gungu. All rights reserved. Ligand-Based Virtual Screening Using Bayesian Inference Network and Reweighted Fragments Tue, 01 May 2012 15:42:31 +0000 Many of the similarity-based virtual screening approaches assume that molecular fragments that are not related to the biological activity carry the same weight as the important ones. This was the reason that led to the use of Bayesian networks as an alternative to existing tools for similarity-based virtual screening. In our recent work, the retrieval performance of the Bayesian inference network (BIN) was observed to improve significantly when molecular fragments were reweighted using the relevance feedback information. In this paper, a set of active reference structures were used to reweight the fragments in the reference structure. In this approach, higher weights were assigned to those fragments that occur more frequently in the set of active reference structures while others were penalized. Simulated virtual screening experiments with MDL Drug Data Report datasets showed that the proposed approach significantly improved the retrieval effectiveness of ligand-based virtual screening, especially when the active molecules being sought had a high degree of structural heterogeneity. Ali Ahmed, Ammar Abdo, and Naomie Salim Copyright © 2012 Ali Ahmed et al. All rights reserved. Effects of Pooling Samples on the Performance of Classification Algorithms: A Comparative Study Mon, 30 Apr 2012 13:51:14 +0000 A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints. Kanthida Kusonmano, Michael Netzer, Christian Baumgartner, Matthias Dehmer, Klaus R. Liedl, and Armin Graber Copyright © 2012 Kanthida Kusonmano et al. All rights reserved. A Novel Partial Sequence Alignment Tool for Finding Large Deletions Sun, 01 Apr 2012 08:52:33 +0000 Finding large deletions in genome sequences has become increasingly more useful in bioinformatics, such as in clinical research and diagnosis. Although there are a number of publically available next generation sequencing mapping and sequence alignment programs, these software packages do not correctly align fragments containing deletions larger than one kb. We present a fast alignment software package, BinaryPartialAlign, that can be used by wet lab scientists to find long structural variations in their experiments. For BinaryPartialAlign, we make use of the Smith-Waterman (SW) algorithm with a binary-search-based approach for alignment with large gaps that we called partial alignment. BinaryPartialAlign implementation is compared with other straight-forward applications of SW. Simulation results on mtDNA fragments demonstrate the effectiveness (runtime and accuracy) of the proposed method. Taner Aruk, Duran Ustek, and Olcay Kursun Copyright © 2012 Taner Aruk et al. All rights reserved. Nonlinear Model-Based Method for Clustering Periodically Expressed Genes Tue, 01 Nov 2011 00:00:00 +0000 Clustering periodically expressed genes from their time-course expression data could help understand the molecular mechanism of those biological processes. In this paper, we propose a nonlinear model-based clustering method for periodically expressed gene profiles. As periodically expressed genes are associated with periodic biological processes, the proposed method naturally assumes that a periodically expressed gene dataset is generated by a number of periodical processes. Each periodical process is modelled by a linear combination of trigonometric sine and cosine functions in time plus a Gaussian noise term. A two stage method is proposed to estimate the model parameter, and a relocation-iteration algorithm is employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. One synthetic dataset and two biological datasets were employed to evaluate the performance of the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g., k-means) for periodically expressed gene data, and thus it is an effective cluster analysis method for periodically expressed gene data. Li-Ping Tian, Li-Zhi Liu, Qian-Wei Zhang, and Fang-Xiang Wu Copyright © 2011 Li-Ping Tian et al. All rights reserved. Minimum Information About a Microarray Experiment (MIAME) – Successes, Failures, Challenges Mon, 01 Jan 1900 00:00:00 +0000 The Minimum Information About a Microarray Experiment (known as MIAME) guidelines describe information that needs to be provided to enable the interpretation of the results of a microarray-based experiment unambiguously. The MIAME guidelines were developed by the Microarray Gene Expression Data (MGED) Society. Since the MIAME position paper was published in 2001, it has been cited in the scientific literature well over a thousand times. MIAME has been replicated for many other technologies, the major data repositories are supporting MIAME, and most scientific journals have adopted MIAME guidelines as a requirement for publishing. With the advent of new-generation sequencing technology, MIAME faces new challenges. To address this, the MGED Society has proposed new guidelines, i.e., Minimum Information about a high-throughput SeQuencing Experiment (MINSEQE). Here we present analysis of the reasons for the success of MIAME, as well as discuss where it has failed, and the challenges it faces. Alvis Brazma Copyright © 2009 Alvis Brazma. All rights reserved. Crystal Structures of Tcl1 Family Oncoproteins and Their Conserved Surface Features Mon, 01 Jan 1900 00:00:00 +0000 Members of the TCL1 family of oncogenes are abnormally expressed in mature T-cell leukemias and B-cell lymphomas. The proteins are involved in the coactivation of protein kinase B (Akt/PKB), a key intracellular kinase. The sequences and crystal structures of three Tcl1 proteins were analyzed in order to understand their interactions with Akt/PKB and the implications for lymphocyte malignancies. Tcl1 proteins are ~15 kD and share 25—80% amino acid sequence identity. The tertiary structures of mouse Tcl1, human Tcl1, and Mtcp1 are very similar. Analysis of the structures revealed conserved semi-planar surfaces that have characteristics of surfaces involved in protein-protein interactions. The Tcl1 proteins show differences in surface charge distribution and oligomeric state suggesting that they do not interact in the same way with Akt/PKB and other cellular protein(s). John M. Petock, Ivan Y. Torshin, Yuan-Fang Wang, Garrett C. Du Bois, Carlo M. Croce, Robert W. Harrison, and Irene T. Weber Copyright © 2002 John M. Petock et al. All rights reserved. Protocols for 16S rDNA Array Analyses of Microbial Communities by Sequence-Specific Labeling of DNA Probes Mon, 01 Jan 1900 00:00:00 +0000 Analyses of complex microbial communities are becoming increasingly important. Bottlenecks in these analyses, however, are the tools to actually describe the biodiversity. Novel protocols for DNA array-based analyses of microbial communities are presented. In these protocols, the specificity obtained by sequence-specific labeling of DNA probes is combined with the possibility of detecting several different probes simultaneously by DNA array hybridization. The gene encoding 16S ribosomal RNA was chosen as the target in these analyses. This gene contains both universally conserved regions and regions with relatively high variability. The universally conserved regions are used for PCR amplification primers, while the variable regions are used for the specific probes. Protocols are presented for DNA purification, probe construction, probe labeling, and DNA array hybridizations. Knut Rudi, Janneke Treimo, Hilde Nissen, and Gerd Vegarud Copyright © 2003 Knut Rudi et al. All rights reserved. Stimulation of Apoptosis by Computationally Derived Small Molecules that Bind to BCL-2 Mon, 01 Jan 1900 00:00:00 +0000 Martha Mutomba, Jing Wang, Sergei Mailiartchouk, Tom Brady, Darryl Rideout, Christina Niemeyer, Hengyi Zhu, Cindy Fisher, Seymour Mong, and Kal Ramnarayan Copyright © 2001 Martha Mutomba et al. All rights reserved. Combining Bioinformatics and Biophysics to Understand Protein-Protein and Protein-Ligand Interactions Mon, 01 Jan 1900 00:00:00 +0000 Barry Honig Copyright © 2002 Barry�Honig. All rights reserved. What Is Artificial about Life? Mon, 01 Jan 1900 00:00:00 +0000 The announcement of “Artificial Life” by the Craig Venter group, and the media stir that arose from the news, provoked thoughts about the current technologies in contemporary science and the cultural tension of such projections on the media. The increasingly blurred boundaries between specialist and generalist media, while promising a wider appreciation of scientific discovery, potentially allow unrealistic, ideological claims to dictate scientific research. This is particularly evident in biology, where the pervading paradigm is still dominated by a physically naïve reductionism in which the only relevant causative layer is the molecular one. The reductionist hypothesis is that everything one observes is the result of an underlying molecular mechanism almost independent of the context in which it operates. Molecular mechanisms are often necessarily studied in isolation and therefore operate in unnatural conditions. The mechanistic view of biological regulation implies that we think of genes as intelligent agents. Here we try to critically analyze the motivations behind the spread of such unrealistic simplifications. Alessandro Giuliani, Ignazio Licata, Carlo M. Modonesi, and Paolo Crosignani Copyright © 2011 Alessandro Giuliani et al. All rights reserved. Delineating Novel Signature Patterns of Altered Gene Expression in Schizophrenia Using Gene Microarrays Mon, 01 Jan 1900 00:00:00 +0000 Schizophrenia is a complex and devastating brain disorder that affects 1% of the population and ranks as one of the most costly disorders to afflict humans. This disorder typically has its clinical onset in late adolescence or early adulthood, presenting as a constellation of delusions and hallucinations (positive symptoms); decreased motivation, emotional expression, and social interactions (negative symptoms); and impaired learning and memory (cognitive symptoms). The etiology of schizophrenia is unknown, but appears to be multifaceted, with genetic and epigenetic developmental factors all implicated. A convergence of observations from clinical, neuroimaging, and anatomical studies has implicated the dorsal prefrontal cortex as a major locus of alterations in schizophrenia. Karoly Mirnics, Frank A. Middleton, David A. Lewis, and Pat Levitt Copyright © 2001 Karoly Mirnics et al. All rights reserved. Lipid Mediator Informatics and Proteomics in Inflammation-Resolution Mon, 01 Jan 1900 00:00:00 +0000 Lipid mediator informatics is an emerging area denoted to the identification of bioactive lipid mediators (LMs) and their biosynthetic profiles and pathways. LM informatics and proteomics applied to inflammation, systems tissues research provides a powerful means of uncovering key biomarkers for novel processes in health and disease. By incorporating them with system biology analysis, we review here our initial steps toward elucidating relationships among a range of bimolecular classes and provide an appreciation of their roles and activities in the pathophysiology of disease. LM informatics employing liquid chromatography-ultraviolet-tandem mass spectrometry (LC-UV-MS/MS), gas chromatography-mass spectrometry (GC-MS), computer-based automated systems equipped with databases and novel searching algorithms, and enzyme-linked immunosorbent assay (ELISA) to evaluate and profile temporal and spatial production of mediators combined with proteomics at defined points during experimental inflammation and its resolution enable us to identify novel mediators in resolution. The automated system including databases and searching algorithms is crucial for prompt and accurate analysis of these lipid mediators biosynthesized from precursor polyunsaturated fatty acids such as eicosanoids, resolvins, and neuroprotectins, which play key roles in human physiology and many prevalent diseases, especially those related to inflammation. This review presents detailed protocols used in our lab for LM informatics and proteomics using LC-UV-MS/MS, GC-MS, ELISA, novel databases and searching algorithms, and 2-dimensional gel electrophoresis and LC-nanospray-MS/MS peptide mapping. Yan Lu, Song Hong, Katherine Gotlinger, and Charles Serhan Copyright © 2006 Yan Lu et al. All rights reserved. Combining the Performance Strengths of the Logistic Regression and Neural Network Models: A Medical Outcomes Approach Mon, 01 Jan 1900 00:00:00 +0000 The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s) between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression) and machine learning (i.e., neural network) technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models. Wun Wong, Peter J. Fos, and Frederick E. Petry Copyright © 2003 Wun Wong et al. All rights reserved.