Raman spectroscopy grows into an essential tool for biomedical applications. Nevertheless, the weak Raman signal associated mainly with biological samples is often obscured by a broad background signal due to the intrinsic fluorescence of the organic molecules present, making further analysis unfeasible. A computational geometry method based on the definition of convex hull is described to estimate the background from Raman spectra of samples with biological interest. The method is semiautomated requiring sample-dependent user intervention. It does not depend, however, on curve fitting, requires no information about background distribution or source, and keeps the original spectral data intact.

1. Introduction

Raman spectroscopy has been extensively applied in recent years in a variety of biological research ranging from the in situ tissue diagnosis to the analysis of subcellular components. Being a vibrational spectroscopic technique based on inelastic scattering, Raman spectroscopy provides rich molecular information about the chemical composition of samples and exhibits high sensitivity to minute biochemical changes. Furthermore, it is attractive for biomedical studies since it is intrinsically nonintrusive and does not require external labels. The positions and relative intensities of the Raman bands are the basic spectral characteristics for exploring the structure and the function of several biological molecules. This interpretation, however, is often hindered by the broad background signal mostly due to fluorescence from organic molecules and contaminants. The intensity of fluorescent is usually much higher than the weak Raman signal in biological samples, and therefore the subtraction of background is an essential process to extract reliable analytical information from biomedical Raman data.

Apart from instrumental specific design approaches, a number of computational methods have been proposed for background removal from Raman spectra. These methods include polynomial fitting [17], first- and second-order differentiation [8, 9], wavelet transformation [1015], frequency-domain filtering [16], and principal component analysis (PCA) [17]. All of the above methods have certain strengths and drawbacks depending on the problem they are trying to deal with. For example, low-order polynomial fitting is suitable for spectra with broad background but it is not effective for biological samples which feature Raman spectra with several adjacent, not readily obvious, peaks. Higher-order polynomials may be susceptible to data over fitting [2]. Differentiation may also distort peak shapes and therefore creates an inconsistent spectrum compared to the preprocessed one [1]. Wavelets analysis, which is the Fourier transform analog for localized functions, is a promising solution although the transformation of the signal into predetermined frequency bands may cause distortion in some part of the spectra [15].

In the present study, we describe a novel semiautomated background removal method which is based on the geometric definition of convex hull [18]. The effectiveness of the method is demonstrated through theoretical and experimental biomedical Raman spectra.

2. Theoretical Background

The signal, S, is assumed as a composition of a low frequency component (B(x), background) and the true information, P(x), so that S = P(x) × B(x). The background is the slow part of the composite signal and resides in the vicinity of the low frequencies range. With the application of low-pass filtering, we extract from the complex signal a rough estimate of the true background component. The first step works by applying a Fourier transform to the signal, that is, decreasing the high frequency components and inverse transforming the result. In this way, we have managed to break up the signal into a superposition of infinitely many sinusoids. Each sinusoid can be manipulated individually and then recombined to obtain an approximation to the original periodic function [19]. The second step is to decompose the signal to parts which have the characteristics to be convex sets. This is accomplished by taking regions from peaks to valleys, of the previously filtered signal, via a simple pattern search of a table consisting of 0 and 1 referring to the slope of the signal. A convex hulling minimization routine supplies the single optimal solution for all sets [18] and is able to extract the true background part of the region by introducing a new parameter “median.” The latter is a line segment calculated from the statistical data average and by definition is constructed to divide the convex region into two parts. All points with values higher than the median are part of the upper part of convex hull and represent the peaks, while the remaining points represent the true background. The only remaining problem is the continuity of one convex region in respect with the previous or the next one (it is already continuous in the interior of its domain). The simplest approach is by defining user variable (joins) which controls the number of linking points of the lower part of the convex region which must be included in the final background array. The outcome captures every essential feature of the background component through a purely geometric semiautomated procedure. Due to its high point of reduction degree, the signal is suitable for subsequent polynomial interpolation, smoothing, and so forth.

3. Materials and Methods

The algorithm was implemented in Mathematica software package (Wolfram Research). For signals sampled at discrete intervals, as in our case, Mathematica uses the discrete Fourier transform [20]. Raman spectra were chosen from literature for comparison purposes. Simulated data is identical to that from [15], while experimental data was acquired with permission from [15] and the hyperSpec project (http://hyperspec.r-forge.r-project.org/).

4. Results and Discussion

Simulated spectrum consisting of three Gaussian peaks with curved background and random noise is shown in Figure 1.

As previously discussed, the first step, (a), is the low-pass filtering, the second step, (b), is finding and optimizing the convex sets, and the last one, (c), is joining the convex sets in a continuous manner. In the case of simulated data, the performance of the algorithm is flawless. Figures 24 depict the experimental Raman spectra of paracetamol, prednisone acetate tablets (PATs), and chondrocytes in cartilage, respectively.

It is evident that the more complicated the signal is, the more Fourier components are needed to approximate the experimental baseline curve. A rough approximation, however, is adequate even for complicated spectra with several bands (Figure 2). In all cases, the background is clearly defined and the signal which does not belong to peak areas is efficiently diminished. Since the Fourier transformation is not applied for smoothing but for extracting the geometric characteristics, the signal retains all its original features avoiding distortions. Nevertheless, in some spectra with low S/N ratio, this may result in negative peaks in the background estimation procedure (circle in Figure 3(c)) due to the calculation methodology of the “median” which does not take into consideration the local slope of the signal but the whole one. A fitting procedure of the data within each convex region will immediately remove such artifacts. However, we did not introduce this computationally intensive improvement because (i) negative peaks appeared only once in our test cases and (ii) we tried to keep the method simple and purely geometric.

5. Conclusions

A computational geometry method for the estimation of the Raman background signal of highly fluorescent samples has been described in this study. Background subtraction was achieved in all cases while the peaks were preserved. The proposed algorithm is semiautomated and requires user input for two variables which define the degree of the Fourier series approximation and the connection of the convex sets. The method is valid for all signals which are convex, that is, one-directional, and, as such, it can be possibly applied to other spectroscopic techniques as well as X-ray powder diffractograms. Preliminary results confirm its wide applicability across diverse spectroscopic data.


The authors thank Dr. Zhi-Min Zhang and Dr. Claudia Beleites for providing the raw data considered in this paper.