Table of Contents Author Guidelines Submit a Manuscript
Advances in Fuzzy Systems
Volume 2016, Article ID 6134736, 19 pages
http://dx.doi.org/10.1155/2016/6134736
Research Article

An Improved Fuzzy Based Missing Value Estimation in DNA Microarray Validated by Gene Ranking

1Department of Computer Science and Engineering, Heritage Institute of Technology, Kolkata 700107, India
2Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata 700152, India
3Department of Computer Science and Engineering, University of Engineering & Management, Kolkata 700156, India
4Department of Computer Science and Engineering, University of Calcutta, Kolkata 700098, India

Received 22 March 2016; Accepted 16 June 2016

Academic Editor: Gözde Ulutagay

Copyright © 2016 Sujay Saha et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Most of the gene expression data analysis algorithms require the entire gene expression matrix without any missing values. Hence, it is necessary to devise methods which would impute missing data values accurately. There exist a number of imputation algorithms to estimate those missing values. This work starts with a microarray dataset containing multiple missing values. We first apply the modified version of the fuzzy theory based existing method LRFDVImpute to impute multiple missing values of time series gene expression data and then validate the result of imputation by genetic algorithm (GA) based gene ranking methodology along with some regular statistical validation techniques, like RMSE method. Gene ranking, as far as our knowledge, has not been used yet to validate the result of missing value estimation. Firstly, the proposed method has been tested on the very popular Spellman dataset and results show that error margins have been drastically reduced compared to some previous works, which indirectly validates the statistical significance of the proposed method. Then it has been applied on four other 2-class benchmark datasets, like Colorectal Cancer tumours dataset (GDS4382), Breast Cancer dataset (GSE349-350), Prostate Cancer dataset, and DLBCL-FL (Leukaemia) for both missing value estimation and ranking the genes, and the results show that the proposed method can reach 100% classification accuracy with very few dominant genes, which indirectly validates the biological significance of the proposed method.