Table of Contents Author Guidelines Submit a Manuscript
BioMed Research International
Volume 2015, Article ID 621690, 10 pages
http://dx.doi.org/10.1155/2015/621690
Research Article

The Impact of Normalization Methods on RNA-Seq Data Analysis

1Department of Mathematical and Statistical Methods, Poznan University of Life Sciences, 60-637 Poznan, Poland
2Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
3Department of Hematology and Bone Marrow Transplantation, Poznan University of Medical Sciences, 60-569 Poznan, Poland

Received 20 March 2015; Revised 17 May 2015; Accepted 18 May 2015

Academic Editor: Ernesto Picardi

Copyright © 2015 J. Zyprych-Walczak et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.