Table of Contents Author Guidelines Submit a Manuscript
Advances in Bioinformatics
Volume 2016 (2016), Article ID 3791214, 7 pages
http://dx.doi.org/10.1155/2016/3791214
Research Article

A Support Vector Machine Classification of Thyroid Bioptic Specimens Using MALDI-MSI Data

1Department of Medicine and Surgery, University of Milano-Bicocca, Via Cadore 48, 20900 Monza Brianza, Italy
2Department of Informatics, Systems and Communication, University of Milano-Bicocca, Viale Sarca 336, 20125 Milan, Italy
3Department of Surgery and Translational Medicine, Section of Pathology, University of Milano-Bicocca, Milan, Italy

Received 30 November 2015; Accepted 24 April 2016

Academic Editor: Rita Casadio

Copyright © 2016 Manuel Galli et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Biomarkers able to characterise and predict multifactorial diseases are still one of the most important targets for all the “omics” investigations. In this context, Matrix-Assisted Laser Desorption/Ionisation-Mass Spectrometry Imaging (MALDI-MSI) has gained considerable attention in recent years, but it also led to a huge amount of complex data to be elaborated and interpreted. For this reason, computational and machine learning procedures for biomarker discovery are important tools to consider, both to reduce data dimension and to provide predictive markers for specific diseases. For instance, the availability of protein and genetic markers to support thyroid lesion diagnoses would impact deeply on society due to the high presence of undetermined reports (THY3) that are generally treated as malignant patients. In this paper we show how an accurate classification of thyroid bioptic specimens can be obtained through the application of a state-of-the-art machine learning approach (i.e., Support Vector Machines) on MALDI-MSI data, together with a particular wrapper feature selection algorithm (i.e., recursive feature elimination). The model is able to provide an accurate discriminatory capability using only 20 out of 144 features, resulting in an increase of the model performances, reliability, and computational efficiency. Finally, tissue areas rather than average proteomic profiles are classified, highlighting potential discriminating areas of clinical interest.