Journal of Spectroscopy

Volume 2019, Article ID 4296153, 8 pages

https://doi.org/10.1155/2019/4296153

## Superparamagnetic Clustering of Diabetes Patients Raman Spectra

^{1}Biophysics and Biomedical Sciences Laboratory, Centro Universitario de Lagos, Universidad de Guadalajara, Enrique Díaz de León S/N Paseo de la Montaña, CP 47460, Lagos de Moreno, Jal, Mexico^{2}Departamento de Ingeniería, Universidad Iberoamericana León, Blvd. Jorge Vértiz Campero, Fracciones Canadá de Alfaro, CP 37238, León, Guanajuato, Mexico^{3}Centro de Ciencias de la Salud, Universidad Autónoma de Aguascalientes, Av. Universidad 940, Aguascalientes 20131, Mexico

Correspondence should be addressed to J. L. González-Solís; moc.liamg@8690siulj

Received 28 April 2019; Revised 26 August 2019; Accepted 17 September 2019; Published 5 November 2019

Academic Editor: Nikša Krstulović

Copyright © 2019 J. L. González-Solís et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In this paper, we present a different way to the standard methods to classify Raman spectra whose grouping process is based on a phenomenon of clustering observed in nature at the atomic level and correctly described by the statistical physics model known as the Potts model, which represents the interacting spins on a crystalline lattice. This clustering method is known as the super paramagnetic clustering (SPC), which allows identifying hierarchical structures in data banks. In this novel method, we assigned a Potts spin to each data point (Raman spectrum) and introduced an interaction between neighboring points whose coupling strength is a decreasing function of the distance between the nearest neighboring sites. We found a hierarchical tree structure in our data bank of Raman spectra allowing us to discriminate between the spectra from control and diabetes patients. The sensitivity and specificity of the diabetes detection technique by Raman spectroscopy were calculated directly because the SPC method achieves an accurate determination of the members of each cluster. As a cross-check, SPC results were compared with published results of multivariate analysis, observing excellent agreements; however, the SPC method allows determining the members of all identified clusters explicitly.

#### 1. Introduction

In recent years, spectroscopic techniques such as Raman spectroscopy, Fourier-transform infrared spectroscopy, X-ray spectroscopy, and mass spectroscopy have become fundamental tools in the fields of chemistry, drugs, the agro-food sector, life sciences, and environmental analysis to study different biological systems based on the chemical and structural composition of biological samples [1–3].

In these techniques, once spectra are captured, mathematical tools to classify them are required; however, spectra corresponding to biological samples usually show a high complexity because they contain a large number of peaks of different intensities and forms, unlike spectra corresponding to nonbiological samples where discrimination between a pair of samples turns out to be relatively simple. Furthermore, the study of complex systems, where the comparison between a large set of spectra is necessary, has motivated the application of novel methods that allow identifying patterns in large banks of spectra.

Among the main techniques applied in the analysis of spectra, we have multivariate analysis (principal component analysis and linear discriminant analysis) [4, 5] and clustering analysis (*K*-means and spectral norm methods) [6]. Nevertheless, among these clustering methods, the ones that acquire particular interest are those methods that allow exploration of hierarchical structures in data banks, facilitating the study of diseases characterized by being classified into either different types or showing various stages of progress [4].

Among these hierarchical clustering methods, there is one that has brought particular interest because its clustering process is based on a phenomenon of clustering observed in nature at the atomic level, and it is correctly described by a statistical physics model known as the Potts model, which represents the interacting spins on a crystalline lattice. This method is known as the SPC method, which has already been successfully applied in the discrimination between leukemia, breast, and cervical cancer [7]. In the same way, this method has been applied to study gene expression [8, 9] and protein sequences [10] and even because the temporary evolutions of stock market returns are well described by random processes, SPC has also been used for the stock exchange analysis [11, 12].

In this paper, we propose the SPC method as a novel way to classify Raman spectra hoping to observe a hierarchical structure in the bank of spectra and identify Raman spectra corresponding to healthy and type 2 diabetes patients. SPC method and Raman spectroscopy could form a better method of diabetes detection with high sensitivity and specificity.

#### 2. SPC Method

In the ferromagnetic model, each point is considered to have a Potts spin, equivalent to one of *q* integer values, *s*_{i} = 1, 2, …, *q*. The distance matrix, *d*_{ij}, represents the Euclidean distances between neighboring sites and . Input data for the SPC method are represented by this distance matrix containing all the distances between the data points. The distance matrix is used to construct a graph whose vertices are the data points, and edges correspond to connections between neighboring points. Two points are considered to be neighbors (and thus have an edge) if they are within the *K*-nearest neighbors of each other.

Pair of neighboring points and that has the same spin (*s*_{i} = *s*_{j}) is interacting via a coupling of short-range:where *d*_{ij} is the Euclidean distance between points and , is the mean distance between interacting neighbors, and is the average number of interacting neighbors of a point [13–15]. The strength *J*_{ij} is a decreasing function of the distance *d*_{ij} so that the closer the two points are to each other, the more they like to belong to the same cluster, and the interaction between points that are not neighbors is set to zero.

The energy function of the system is given by the Hamiltonian of an inhomogeneous ferromagnetic Potts model:where the notation stands for neighboring sites and and the summation is over interacting neighbors. is the state of the system, and delta function, if and zero if . The thermodynamic average of a physical quantity *A* at a temperature *T* can be calculated using , where is the probability density of Boltzmann and , where *Z* is the partition function, .

A Potts system may have three different phases depending on the temperature and interactions: ferromagnetic, paramagnetic, or superparamagnetic phase. The system is ferromagnetic at low temperatures and paramagnetic at high temperatures. By increasing the temperature from zero, the system passes from the ferromagnetic to the paramagnetic state either directly in a single transition or via the intermediate superparamagnetic phase. This last phase is of considerable interest in the study of disordered systems, especially in the context of data clustering as clusters of aligned spins automatically divide the data into their natural classes, and a clear hierarchical structure among the classes emerges when varying the temperature.

The average spin-spin correlation function, , is used to decide whether or not two spins belong to the same cluster. In contrast, with the mere interpoint distance, the spin-spin correlation function is sensitive to the collective behavior of the system and is, therefore, a suitable quantity for defining clusters.

In this study, the SPC method, as Blatt et al. describe it [14, 15], was applied. Blatt et al. used the Swendsen–Wang Monte Carlo Simulation [16, 17] to generate a Markov chain in the Potts model. In the procedure, an initial configuration is generated by assigning a random value (spin) to each point. Subsequently, frozen bonds are assigned between nearest neighboring points and with a probability

Thus, subgraphs are connected by frozen bonds. Later, a new configuration is created, i.e., spins of each subgraph are assigned to a new spin value randomly chosen. Spins that belong to the same subgraph are assigned to the same value. It is repeated a maximum number of times.

To select the temperature in which the inherent emergence of clusters nested in hierarchies took place, the magnetic susceptibility or variance of the magnetization (*m*), , is calculated [18]. The peaks of *χ* indicate phase transitions: the transition between the ordered state (magnetic) and partially ordered state (superparamagnetic), as well as, the partially ordered state and the unordered state (nonmagnetic). Starting with low temperature and increasing the temperature, *χ* increases quickly when clusters begin to split. As the temperature is raised, the system may break first into two clusters, each of which breaks into more subclusters and so on. Such a hierarchical structure of the magnetic clusters reflects a hierarchical organization of the data into classes and subclasses.

After the clusters have been determined, the most natural clusters (clusters without substructures) are identified. The natural clusters were chosen using the sequential procedure proposed by Ott et al., which takes those clusters that have the largest *T-*range (denoted by *T*_{cl}) [19]. Ott defines a *T-*stability, *S*_{T}, of a cluster aswhere *T*_{max} is the temperature of the paramagnetic transition. Thus, *S*_{T} expresses the stability of the cluster concerning the stability of the whole data set. This procedure stops in a branch if no more stable substructures can be found, i.e., if the most stable cluster detected is less stable than a threshold value *S*_{ϴ} (*S*_{T} < *S*_{ϴ}). The natural clusters themselves do not have any substructures since they show a direct transition from the ferromagnetic phase to the paramagnetic phase, so the temperature that marks the end of the ferromagnetic phase, *T*_{ferro}, is a good indicator of how natural a cluster is. Thus, *S*_{ϴ} is the main control parameter that is set from outside.

#### 3. Methodology

We applied the SPC method to study the hierarchical structure of the data bank whose elements are Raman spectra. The data bank is made up of 182 Raman spectra with 102 spectra from control patients and 80 spectra from diabetes patients. Each spectrum is composed of 2330 peaks with their respective intensities. The Raman spectra were measured from blood serum samples obtained from 15 patients who were clinically diagnosed with type 2 diabetes mellitus and 20 healthy volunteer controls. All patients were from the western central region of Mexico and had similar ethnic and socioeconomic backgrounds. In order to measure the Raman spectra, we focused a laser of 830 nm of wavelength (Jobin-Yvon LabRAM HR800 Raman apparatus) on different points of a small serum sample. To ensure statistically sound sampling, around five spectra from different regions of each serum sample were collected. Details of the samples used and spectra measured in the study are shown in Table 1.