International Journal of Analytical Chemistry

Volume 2018, Article ID 8032831, 8 pages

https://doi.org/10.1155/2018/8032831

## Authenticity Detection of Black Rice by Near-Infrared Spectroscopy and Support Vector Data Description

^{1}Key Lab of Process Analysis and Control of Sichuan Universities, Yibin University, Yibin, Sichuan 644000, China^{2}Hospital, Yibin University, Yibin, Sichuan 644000, China^{3}The First Affiliated Hospital, Chongqing Medical University, Chongqing 400016, China

Correspondence should be addressed to Chao Tan; moc.361@2111natoahc

Received 26 March 2018; Revised 3 June 2018; Accepted 11 June 2018; Published 9 July 2018

Academic Editor: Richard G. Brereton

Copyright © 2018 Hui Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Black rice is an important rice species in Southeast Asia. It is a common phenomenon to pass low-priced black rice off as high-priced ones for economic benefit, especially in some remote towns. There is increasing need for the development of fast, easy-to-use, and low-cost analytical methods for authenticity detection. The feasibility to utilize near-infrared (NIR) spectroscopy and support vector data description (SVDD) for such a goal is explored. Principal component analysis (PCA) is used for exploratory analysis and feature extraction. Another two data description methods, i.e., k-nearest neighbor data description (KNNDD) and GAUSS method, are used as the reference. A total of 142 samples from three brands were collected for spectral analysis. Each time, the samples of a brand serve as the target class whereas other samples serve as the outlier class. Based on both the first two principal components (PCs) and original variables, three types of data descriptions were constructed. On average, the optimized SVDD model achieves acceptable performance, i.e., a specificity of 100% and a sensitivity of 94.2% on the independent test set with tight boundary. It indicates that SVDD combined with NIR is feasible and effective for authenticity detection of black rice.

#### 1. Introduction

Black rice is an economically important special rice species and has been consumed for a long time in Southeast Asia including China [1–3]. Many researches have showed that black rice has considerably strong free-radical scavenging and antioxidation effects, as well as other biological effects of its extracts such as antimutagenic and anticarcinogenic [4, 5]. Black rice quality in terms of nutrition is also valuable for its protein content and the balance of essential amino acids. In fact, black rice is also a mixture of various carbohydrates. There exist varying amounts of nutrient in different kinds of black rice because of genetic and environmental factors. In market, there exist many brands of black rice. The quality and price of them vary greatly and renowned brands have higher price. However, illegal tradesman often passes low-priced black rice off as high-priced ones for economic benefit, especially in some remote towns.

How to discriminate different types of black rice is interesting. Up to now, it is mainly dependent on human senses. More objective and novel methods are maybe based on complex instruments such as high performance liquid chromatography or mass spectroscopy (MS) [6]. In recent years, molecular spectroscopy has drawn more attention and proved to be a powerful tool for authenticity detection [7–9]. In particular, near-infrared (NIR) spectroscopy becomes the most widely used technique in various fields including cigarettes [10], food [11], textile [12], medicine [13], and drug [14]. It is capable of rapidly obtaining a vector/matrix signal of a complex sample and therefore provides the chance of executing a in-depth qualitative or quantitative analysis. Detection of food authenticity is a important task in food analysis and aims to answer the question on which class a particular sample belongs to by its spectral signal. Often, it can be realized by comparing spectra of a specimen to be identified with spectra of “known” or “standard.” As for NIR spectroscopy, however, spectral signals for complex food systems are characterized by peak overlapping and poor resolution. So, an appropriate chemometric model is indispensable for a NIR-based application.

For the perspective of modeling, chemometrics involving qualitative tasks can be divided into two categories: classification and one-class classification, i.e., data description [15]. Classification is vey often considered as a synonym of discriminant analysis methods since they assign a new sample to one of a set of predefined classes. The corresponding classifier is trained on a training set. Data description differs in one essential aspect from the conventional classification since it is assumed that only information on a single class is available. Data description problems are common in the real world where positive objects are widely available but negative ones are maybe hard, expensive, or even impossible to gather [16]. In the literature, three main approaches can be distinguished: the density estimation, the boundary methods, and the reconstruction methods [17]. General demand of any authentication problem is that a genuine class, i.e., a target class, must be known [18, 19]. The target class is always unique for a specific authentication problem. Any other objects, or classes of objects, that are not members of the target class are considered as outliers. This also means that just samples of the target class can be utilized and that no information on the other classes is present. For data description, the boundary surrounding the target class has to be estimated from available data, such that it accepts as much of the target samples as possible and minimizes the error of accepting outlier. Up to now, much effort has been expended to develop classification algorithms, and the concept of data description is also of interest and noticeable [20–23], especially in the cases where it is impossible to meaningfully define all of the classes and obtain fully representative samples. In food authenticity, the interest is focused on a single target class so as to verify compliance of samples with the features of that class, and a data description approach should be adopted to build an enclosed boundary around the target class.

The present work focuses on exploring the feasibility to utilize near-infrared (NIR) spectroscopy and support vector data description (SVDD) for authenticity diction of black rice. Principal component analysis (PCA) is used for exploratory analysis and feature extraction. Another two data description methods, i.e., k-nearest neighbor data description (KNNDD) and GAUSS method, are used as the reference. A total of 142 samples from three brands were collected for spectral analysis. All spectra were preprocessed beforehand by standard normal transformation (SNV). Each time, the samples of a brand serve as the target class whereas other samples serve as the outlier class. Based on both the first two principal components (PCs) and original variables, three types of data descriptions were constructed. On average, the optimized SVDD model achieves acceptable performance, i.e., a specificity of 100% and a sensitivity of 94.2% on the independent test set with tight boundary. The effect of training set size and the parameter of kernel width have also been discussed. It indicates that SVDD combined with NIR is feasible and effective for authenticity detection of black rice.

#### 2. Theory and Methods

Many methods have been developed to solve the one-class or data description problem and they can be divided into three main categories: density, boundary, and reconstruction methods. Here, three algorithms, i.e., support vector data description, Nearest Neighbor Method, and Gaussian Method, are introduced and used for experiments, among which the first two are boundary methods and the last one belongs to density method.

##### 2.1. Support Vector Data Description (SVDD)

SVDD is a novel algorithm for one-class classification problems, which has been proposed by Tax [15], inspired by the idea of the support vector machines. It focuses on finding a minimum hypersphere around the target class. The hypersphere can be used to decide whether new objects are targets or outliers. Such a sphere is characterized only by center and radius . When seeking sphere, it needs to minimize the volume of the sphere by minimizing and demand that the sphere covers as many training samples as possible. Given the training set , the task in SVDD is to minimize error function:where and are the center and the radius of the hypersphere, respectively; is the penalty factor which regulates the hyperspherical volume and error, i.e., the number of target objects rejected; is a slack variable for allowable error limitation. Almost all objects are within the sphere. This optimization problem can be solved by Lagrange multiplier method [24].

Because the target class is not spherically distributed in most cases, some traditional decision rules may not work well. To make a more effective and flexible decision, the original data can be implicitly transformed to a higher dimension by the so-called kernel function . Several kernel functions including linear, polynomial, Gaussian, radial basis function (RBF) are available [25, 26]. In this work, the RBF kernel, the most commonly used kernel in machine learning, was used. The form of RBF kernel iswhere is a key parameter for controlling the boundary tightness.

##### 2.2. Nearest Neighbor Method

The most straightforward and simplest method to obtain a one-class model is to estimate the density of the training set. Unfortunately, it often requires a large number of samples to avoid the curse of dimensionality. Instead of estimating whole probability densities, an indication of the resemblance can also be acquired by comparing distances. Nearest neighbor method can be derived from a local density estimation [27]. It avoids the explicit density estimation by only using distances to the first nearest neighbor. In the process of density estimation, a cell, often an hypersphere in d-dimension space, is centered around the test object . The cell volume is grown until it contains objects from the training set. The local density can be estimated bywhere and are the nearest neighbors of in the training set and the volume of the cell containing this object. Later, we will use KNNDD to denote this method.

For an unknown test object , the distance from it to its nearest neighbor in the training set is compared with the distance from to its nearest neighbor. The test object can be accepted when its local density is larger or equal to the density of the nearest neighbor. It seems to be very useful for distributions characterized by fast decaying probabilities. Obviously, the method can easily be generalized to a larger number of neighbors k. That is, instead of taking the first nearest neighbor into account, the th neighbor should be considered.

##### 2.3. Gaussian Method

When a proper probability model is assumed and the sample size is sufficient, density method is advantageous for one-class problem. With the optimization of the threshold, a minimum volume can be automatically found for the given probability density. When only a little amount of samples is available, the simplest model is the unimodal Gaussian/Normal distribution. It fits a probability density model as follows:where is the mean and is the covariance matrix. Both should be estimated from the training set. For dimensional data, the number of the parameters isThe method imposes a strict unimodal and convex density model on the data. The main computational effort is maybe the inversion of the covariance matrix. In case of badly scaled data or data with singular directions, it is difficult to calculate the inverse of and it can be approximated by the pseudoinverse or by introducing regularization (adding a small constant to the diagonal, i.e., ). In the last case, the user needs to supply a parameter . This is also the only magic parameter that requires a user to provide.

Finally, a threshold on the probability density needs to be set for distinguishing between target and outlier data. Accepting 95% of the objects requires a threshold on the Mahanalobis distanceofwhere is the inverse with degrees of freedom. This method is expected to work effectively only if the data is unimodal and convex. To obtain a more flexible density method, it can be extended to a mixture of Gaussians. Later, we will use GAUSS to denote this method.

#### 3. Experimental

##### 3.1. Sample Preparation

A total of 142 samples/bag of black rice of three brands were purchased from local supermarkets in China. They were from different supplier and let us mark them as A, B, and C brands. These samples were collected from three batches of A, two batches of B, and three batches of C but different packages. For A or C, forty-eight bags of rice were sampled, sixteen bags for each batch; For B, forty-six bags of rice were sampled, twenty-three bags for each batch. In total, the number of samples belonging to A, B, and C are 48, 46, and 48, respectively. The time it takes to collect the sample is about six months. The samples of each brand could serve as the target class whereas other samples acted as the outlier class. All samples were stored in the laboratory kept at 25°C for more than 7 days in order to achieve a temperature balance. To reduce the effect of environment, the NIR spectra of all samples were recorded on the same day.

##### 3.2. Spectral Measurement and Preprocessing

Spectra of different samples collected on an Antaris II FT-NIR spectrometer (Thermo Scientific Co./USA) were equipped with an integrating sphere module, a rotating sample cup, and a InGaAs detector, as well as a tungsten lamp as the light source. The sample was poured into a standard sample cup with a 50 mm diameter and the height was controlled on about** 30 mm** for preventing light leak**.** An internal gold reference was used for automatic background collection. A specific sample cup spinner accessory for the integrating sphere sampling module that allows multipoint reflection measurements of heterogeneous solids such as powders, granules, and pellets, was used for obtaining NIR spectra of high quality. In this way, the final spectrum is the average of the spectra collected at different locations, which can reduce the effect of heterogeneity of solids to some extent.

The NIR spectrum was measured in the region of 10,000–4000 cm^{−1} with 32 scans at a resolution of 3.856 cm^{−1}. Each spectrum contains 1557 data points. The experimental temperature and the related humidity were controlled around 25°C and 60%, respectively. Preprocessing of spectra is often of great importance if reasonable results need to be obtained whether it is concerned with qualitative or quantitative tasks. Several methods of preprocessing were attempted. In comparison with other preprocessing methods, standard normal transformation (SNV) achieved a satisfactory performance without the need of a reference spectrum and user decision for the computation. So, all spectra were preprocessed by SNV. The spectral measurement was controlled by the Result software [28]. DD toolbox was used for one-class classifier modeling [15]. All calculation was made on MATLAB 2015b for Windows.

#### 4. Results and Discussions

##### 4.1. NIR Spectral Analysis

Figure 1 shows the NIR spectra and all the preprocessed spectra of black rice samples by SNV. Seen from Figure 1, the spectra of three types of black rice share very similar absorbance patterns in the range of 4000-11000 cm^{−1}. They can hardly be distinguished just by naked eyes. General features of a NIR spectrum of solid samples include a multiplicative response to changes in particle size. SNV treatment autoscales each spectrum based on calculating the mean and standard deviation between the densities. It is also clear in Figure 1 that, by preprocessing, some additive and multiplicative effects have been removed.