Abstract

Inductively coupled plasma mass spectrometry (ICP-MS) analytical method was used to determine the content of 40 elements in 38 soybean samples (Glycine Max) from 4 countries. Multivariate statistical methods, such as principal components analysis (PCA), were performed to analyze the obtained data to establish the provenance of the soybeans. Although soybean is widely marketed in many countries, no universal method is used to discriminate the origin of these cereals. Our study introduced the initial step to the identification of the geographical origin of commercial soybean marketed in Vietnam. The analysis pointed out that there are significant differences in the mean of 33 of the 40 analyzed elements among 4 countries’ soybean samples, namely, 11B, 27Al, 44Ca, 45Sc, 47Ti, 55Mn, 56Fe, 59Co, 60Ni, 63Cu, 66Zn, 69Ga, 75As, 78Se, 85Rb, 88Sr, 89Y, 90Zr, 93Nb, 95Mo, 103Rh, 137Ba, 163Dy, 165Ho, 175Lu, 178Hf, 181Ta, 182W, 185Re, 197Au, 202Hg, 205Tl, and 208Pb. The PCA analysis showed that the soybean samples can be classified correctly according to their original locations. This research can be used as a prerequisite for future studies of using the combination of elemental composition analysis with statistical classification methods for an accurate provenance establishment of soybean, which determined a variation of key markers for the original discrimination of soybean.

1. Introduction

In the last few years, numerous advancement in food authentication by using fingerprinting techniques has been reported [14], especially in the case of provenance determination. The majority of the methods are based on the combination of an analytical technique and one or multiple multivariate statistical analysis. First, the samples would be analyzed by a suitable analytical technique to acquire the data of interest, mostly tracing elements content or isotope ratio. Then, this data will be inspected by multivariate statistical analysis [2, 3] to gather the identification or categorization of the studied agriculture products in accordance with its geographical origin. This method relies on the assumption that the composition of an agricultural product’s provenance soil will be reflected on the chemical composition of that product, such as wine [411], coffee [12], tea [13], olive oil, or fruit juice [14, 15], at least for some certain elements [1416]. To ensure the success of this technique, suitable elements or isotopes must be selected carefully so that the soil geochemistry can be reflected by the chosen chemical, and from that, the products can be discriminated correctly. Only a few of the elements can satisfy the mentioned requirement. In addition, solid information of the element component in the sample, mostly at a trace level, is a must if this method would be applied at any degree of success. The most suitable technique for this purpose is inductively coupled plasma mass spectrometry (ICP-MS), with the ability to determine multiple elements in the sample [410].

Furthermore, the most common techniques used for food authenticity and traceability include isotope ratio; liquid and gas chromatography; elemental analysis, spectroscopic techniques, DNA-based techniques, and sensor techniques [17]. Spectroscopic techniques include vibrational [18], hyperspectral [19], fluorescence, and nuclear magnetic resonance [20]; these techniques are rapid and cost-effective and involve less or no sample preparation [21]. For example, Raman spectroscopy combined with support vector machine has been used to identify the rice-producing areas in China [22], with the correct rate, which was nearly 90%, and near-infrared spectroscopic technology combined with multivariate analysis. However, the main drawback is low accuracy due to less sensitivity and high noise.

In previous work, Yuji et al. [16] successfully distinguished the Japanese soybean from the one (Glycine Max) in China and classified the soybean between the interregional of Japan by using ICP-MS analysis combining with LDA model of 6 selected elements from the 24 elements with the use of backward stepwise regression, in particular Ba, Ca, Mn, Nd, W, and Ni. Besides, a commercial energy dispersion X-ray fluorescence (ED-XRF) was able to successfully measure 9 elements (Mg, K, Ca, Mn, Fe, Ni, Cu, Zn, and Rb) in 296 soybean samples from 5 producing areas of northern China (Henan, Inner Mongolia, Xinjiang, Heilongjiang, and Liaoning). The combination of MLP and ED-XRF overcomes the analytical disadvantages found with ICP-MS providing a novel and fast testing method which demonstrated to have a powerful classification capacity with an accuracy rate of 96.2% [12].

In Vietnam, the soybean planting area is not stable; domestic soybean production is only enough to supply about 8–10% of demand, which is up to nearly 200,000 tons/year. But due to the high demand, the import is up to more than 1 million tons/year, which is much higher than the number of domestic production. Similar to Japan and Korea, the lack of strict regulations on the management of agricultural products has led to a situation that people adulterate fake products with the authentic one to improve profits. Besides, they only mention the application of information technology on food traceability instead of the identification of geographical origins by chemical methods. Therefore, to learn from the experience of many countries in the world, the Vietnamese government will certainly need to review and amend the regulations to build geographical tracing methods based on chemical methods.

The difference in the elemental content of soybean samples is related to the content in soil and this is the key point to distinguish the geographical origin [23]. The growth and quality of soybeans are significantly affected by inorganic elements; for example, selenate at low concentrations (0.07 to 0.20 mg Se per kg seed) could promote the growth of soybeans and reduce cadmium [24]. The use of organic fertilizers and soil improvers, such as leonardite, might enrich the contents of macronutrients (Mg, Ca, K, and S) and micronutrients (Fe, Cu, Mn, and Zn).

In this study, the trace element composition of soybeans from 4 countries was compared to identify and classify them according to their origin. The result has shown that fingerprinting is a very promising method in collecting data to ascertain the soybeans’ origin. Previous studies have pointed out these factors: the anthropogenic factors like the consumption of fertilizers and pesticides [6] or pollution [25] and natural factors such as heavy rains during the growing season or irrigation water. This study reported a new approach to originally discriminate the domestic and other imported soybeans in the Vietnam food market by using a combination of ICP-MS analysis and chemometric methods. This approach has been utilized to classify medicinal plants [10] or some types of foods and drinks, for example, tea [26], potato [27], wine [28], and honey [29], because of its high accuracy and sensitivity.

2. Materials and Methods

2.1. Materials

Thirty-eight soybean samples (15 from Vietnam, 8 from Canada, 9 from the US, and 6 from Brazil) packed in 2019 (Table 1) were used. Soybeans VN01-VN09 were provided by 9 supermarkets markets in Vietnam, whereas Soybeans VN10-VN15 were obtained from large residential Vietnamese’s markets. Samples of imported soybeans (Can01-Can08, US01-US09, and Bra01-Bra06) were also provided by supermarkets in Vietnam. All of Vietnam samples came from supermarket and public market in Hanoi, Hai Phong originated from Vietnam northern regions or local farms (Ha Giang and Hanoi); public market in Da Nang, Can Tho, Saigon market; and public market in Ho Chi Minh City having soybeans from Dong Nai. Samples imported from Brazil were all originated from Mato Grosso; Canada samples from Can01 to Can 05 were from Ontario and Can 06 to Can 08 from Manitoba; US samples US01 and US04 originated from Iowa, and other samples came from Illinois. All of the collected samples were stored at −20°C in a deep freezer before they were analyzed.

2.2. Chemicals and Reagents

Nitric acid 65% (HNO3) and hydrogen peroxide 30% (H2O2) solutions were purchased from Merck, USA. Ultrapure deionized water with a resistivity of 18.2 MΩ-cm was obtained from a Milli-Q Plus water purification system (Millipore, Bedford, MA, USA). Twenty-one multielement standard solutions including 11B, 27Al, 44Ca, 55Mn, 56Fe, 59Co, 60Ni, 63Cu, 66Zn, 69Ga, 75As, 78Se, 85Rb, 88Sr, 137Ba, 197Au, 202Hg, 205Tl, 208Pb, 24Mg, and 28Si (10mg/L each element) (TraceCERT, periodic table mix 1 for ICP, product no. 92091, Lot: BCBW5563) and eight rare-earth elements (45Sc, 89Y, 163Dy, 165Ho, 139La, 159Tb, 169Tm, and 175Lu, 10 mg/L each element) were provided by Sigma-Aldrich Company. A standard solution containing 50 µg/L of 47Ti, 90Zr, 93Nb, 95Mo, 103Rh, 178Hf, 181Ta, 182W, 185Re, 232Th, and 238U in 1% HNO3 was used to determine the sensitivity factors for all elements across the entire mass range for the measurement of diluted samples made in semiquantitative mode. If digested samples were analyzed, ethanol would be excluded from the calibration solution. Meanwhile, analysis grade ethanol was used for preparing matrix-matched standards. The internal standard (45Sc, 49In, 83Bi, 89Y, 159Tb, and 32Ge) for the quantitative analysis would be made in 1% HNO3 for the diluted sample, and only in 0.14 M HNO3 for the digested one. Similarly, a solution of in 1% HNO3 and only 0.14 M HNO3 containing 50 µg/L of the internal standard would be used as the blank for the diluted and digested sample analysis in that order. Both standard and internal standard in this studied were prepared by diluting the 1000 mg/L standard stock solution.

2.3. Sample Preparation and ICP-MS Measurements

Accurately weigh 0.5 g of each soybean sample to be placed in a Teflon tube, and then add into the tube 4 mL of concentrated HNO3 (Merck, Germany) and 1 mL of 30% H2O2 (Merck, Germany). Next, transfer the tubes to the microwave oven MARS6 (CEM, US) with the following setting power: 1000–1800 W and temperature: 190°C for 20 minutes. The samples (25 mL) were cooled to room temperature and then diluted with deionized water up to the mark before being analyzed on an Agilent 7900 ICP-MS system (Agilent, Japan). The standard curve was built using the ICP multielemental standard solution at six concentrations 1.0; 2.0; 5.0; 10.0; 20.0; and 50.0 μg/L. The content of each element was calculated based on the standard curves established under the same conditions [3035]. An Agilent 7900 ICP-MS instrument (Agilent Technologies, Tokyo, Japan) was utilized for the measurement of 40 elements in the soybean samples, which were 11B, 24Mg, 27Al, 28Si, 44Ca, 45Sc, 47Ti, 55Mn, 56Fe, 59Co, 60Ni, 63Cu, 66Zn, 69Ga, 75As, 78Se, 85Rb, 88Sr, 89Y, 90Zr, 93Nb, 95Mo, 103Rh, 137Ba, 139La, 159Tb, 163Dy, 165Ho, 169Tm, 175Lu, 178Hf, 181Ta, 182W, 185Re 197Au, 202Hg, 205Tl, 208Pb, 232Th, and 238U. The analytical parameters of the ICP-MS were RF power at 1550 W, RF matching at 2.0 V, cell entrance at −40 V, cell exit of −60 V, cell energy discrimination at 5.0 V, and spray chamber temperature at 2°C. Argon was used as carrier gas at flowrate 1.09 L/min, and Helium was used to eliminate interferences at 4.3 L/min. Data quantitation was achieved regarding matrix-matched multielement standards that had been prepared in 1% HNO3 [3538].

2.4. Method Validation

In this study, instrument detection limits were calculated using the raw intensity data from the standard and the blank (using ultrapure 2% nitric acid matrix) as per the following equation: IDL = 3SDblank × Cx/(Sx − Sblank), where SDblank is the standard deviation of the intensities of the multiple blank measurements, Cx is the mean signal for the standard, and then Sx is signal for Cx and Sblank is signal for blank. Method detection limits (MDLs) were calculated as follows: MDL = IDL × constant volume/sample weight.

Calibration verification standards were prepared from single element ICP standards (Merck) consisting of 3 different sets: Ca, Mg for the high standard series and Al, B, Cu, Rb, Sr, Zn and Cd, Co, Cs, Ni, Tl, V for the low standard series. The calibration verifications were measured after every 10 samples.

The duplicate of two soybean samples was made. Interferences from matrix were examined by evaluating an interference check sample composed of 56Fe, Ca, 63Cu, and 66Zn. Besides, serial dilutions and spike recovery tests were performed with the soybean samples. The serial dilution check was tested by diluting 1 : 10 and then 1 : 3 (thus the final dilution is 1 : 30) with one sample. Several elements were spiked to the soybean samples at the concentration level of 20 and 100 µg/L for 27Al, 63Cu, and 88Sr and 100 and 500 µg/L for the elements 11B, 55Mn, 66Zn, and 85Rb [39].

2.5. Statistical Analysis

Data acquisition and processing were performed by Microsoft Excel 2016 (Microsoft Corporation, USA). For normalization of data, each value of an elemental content was divided by the difference of maximum and minimum contents of the element among samples. The principal component analysis (PCA) was performed on the STATISTICA 12 (Dell Software, USA) and hierarchical clustering analysis (HCA) was implemented on the R package (R Foundation for Statistical Computing, Vienna, Austria).

3. Results and Discussion

3.1. Selection of Elements for Multivariate Analysis

Recently, the public has paid significant attention to the toxicity of potentially harmful chemical substances contained in food [40, 41]. These compounds could cause consequential negative effects on human health, such as food poisoning or cancer. As the result, there is an increase in demand for performing scientific studies in this field in order to extend our knowledge about the impact of the hazardous components in our daily food [4244]. Among the daily food, soybean is one of the most frequently studied subjects, which mostly focuses on the composition of heavy metals (such as 75As, 63Cu, 48Cd, and 208Pb), other inorganic compounds, and organic substances [4549]. Beside soybean safety consumption limit study, this material could also be utilized for other approaches, such as fertilizer residues or polyphenols [5053]. Based on previous studies, it can be concluded that the origin of samples, also known as the history of the product, can be explicated by analyzing the composition of trace elements [10, 5457]. This is especially true with the soybean matrix, as the soybean sample is relatively homogeneous. Besides, collecting soybean in a large number of sample representatives for a large area is a possible and quite easy task.

The results for the analysis of the soybean samples are summarized in Table 2 for the 40 elements (11B, 14Si, 24Mg, 27Al, 44Ca, 45Sc, 47Ti, 55Mn, 56Fe, 59Co, 60Ni, 63Cu, 66Zn, 69Ga, 75As, 78Se, 85Rb, 88Sr, 89Y, 90Zr, 93Nb, 95Mo, 103Rh, 137Ba, 139La, 159Tb, 163Dy, 165Ho, 169Tm, 175Lu, 178Hf, 181Ta, 182W, 185Re, 197Au, 202Hg, 205Tl, 208Pb, 232Th, and 238U).

To verify the measurement results, the data were compared with black soybean data [16]. Since these are two different types of soybean, there were several differences regarding the mineral absorbed by the plant and the concentration of the minerals. There were 13 elements shared between two data sets, in which 12 had their data since the 24Mg measure was lower than the method detection limit of this experiment. The concentrations of Ca from the four countries were lower than the black soybean from Japan, the highest only 908 µg/g compared to 1400 µg/g of Japan black soybean. The concentrations of 55Mn and 182W were 2 to 4 times higher than the concentration of those in Japan black soybean. The concentration of the other elements had mixed measurement; some soybean countries had a certain element concentration higher and some lower than Japan black soybean. These results show that the method results were reliable and suitable for further analysis.

There are overlaps in the concentration range of most elements within the four regions. However, the concentration level of these elements still can be inferred based on the variation of each element concentration level in each region. An examination was done with various binary and ternary scatterplots from different element combinations. In general, multiple combinations of several elements could sufficiently distinguish between any two of the regions. However, it is not enough if a classification for all four regions is required. Thus, the use of scatterplot is not adequate to clarify the differences for the categorization within the four groups. Since the sample size, in this case is the number of soybean samples, was relatively small compared with the number of variables (the number of analyzed element concentration), reducing the number of variables is essential to be able to effectively conduct multivariate statistical analysis. A significant difference of group means at the confidence level of 95% was found by using ANOVA test for these following elements: 11B, 27Al, 44Ca, 45Sc, 47Ti, 55Mn, 56Fe, 59Co, 60Ni, 63Cu, 66Zn, 69Ga, 75As, 78Se, 85Rb, 88Sr, 89Y, 90Zr, 93Nb, 95Mo, 103Rh, 137Ba, 163Dy, 165Ho, 175Lu, 178Hf, 181Ta, 182W, 185Re, 197Au, 202Hg, 205Tl, and 208Pb. Besides, a few elements (139La, 24Mg, 24Si, and 238U) were removed due to the large analytical uncertainty. There are two reasons for this: a high polyatomic background interference, and the element’s concentration levels being close to the MDL of the method.

The ICP-MS results for the samples, which were shown in Table S1, indicated that the contents of heavy metals (75As, 63Cu, 48Cd, 208Pb) and toxic metals (137Ba, 205Tl) in all testing samples were lower than the limiting standards according to the Ministry of Health of Vietnam (0.1 mg/kg for 48Cd, and 0.2 mg/kg for 208Pb) [58]. Thus, these samples met the demands of manufacturing in the Vietnamese food market. Then, the ICP-MS data were used for further multivariate statistics.

3.2. Geographically Original Discrimination of Soybeans

Since the values of the contents were thousandfold different among elements, a min-max normalization method was applied to make all values between 0 and 1. By doing so, all elemental values were standardized into a common scale. In detail, the difference between an elemental content of a sample and the minimum content of this element among samples was divided by the distance between the highest and lowest values of the element. The normalized data were analyzed by multivariate statistical methods, such as HCA and PCA to reduce the data dimension and supply insight discrimination of the samples. On the one hand, the HCA model classified samples by measuring similarities through non-Euclidean distance, which was performed in Figure 1. As can be seen, 38 samples were sharply clustered into four groups based on their origins. While Canadian and US samples had a gentle correlation, the dendrogram witnessed the dramatic separation of soybeans from Vietnam and Brazil.

On the other hand, the data set was further processed on the PCA models not only to distinguish soybean origins but also to identify key elements for the discrimination. From the scree plot of eigenvalues (Figure 2), the first three principal components (PCs) accounted for 90% of the total variance, where 51.6% and 30.8% of the sample variability were explained by PC1 and PC2, respectively.

Figure 3(a) illustrates the PCA score plot of 38 soybean samples, which were sharply separated by their geographical origins. On the loading plot (Figure 3(b)), variables with the highest absolute values in the vertical or horizontal axis had higher influences on the differentiation of the cases on the score plot. The result in Figure 3(b) showed that more elements had positive loadings on PC2 and negative loadings on PC1. 69Ga, 85Rb, and 89Y gave the highest contribution for the separation on PC1, while 103Rh and 108Ta had the strongest effect on PC2. In addition, the variables of which the position is represented on the loading plot similar to the position of the cases on the score plot will be the characteristic variables for that group of functions. In other words, an element will be the “key” for the classification of a certain sample group if their representations on the mentioned graphs are the same. As can be seen, Vietnamese soybeans were distinguished by positive PC1 and PC2. The loading plot indicated the positive values on both the first two PCs of 78Se, 88Sr, 93Nb, and 137Ba at similar positions of those samples on the score plot, which could be explained for the cluster of soybeans from Vietnam. In addition, X and moving R charts (Figures S1aS1d) pointed out that soybeans from Vietnam had the highest contents of those four elements, compared to the importing samples.

Next, the significant separation of soybeans exporting from Brazil was affected by a variety of metals, such as 47Ti, 55Mn, 66Zn 95Mo, 163Dy, and 205Tl, since the content of these elements was considerably higher in Brazilian samples than in the other ones (Figures S2aS2f). For example, 47Ti in Brazilian exporting soybeans ranged from 31 to 42 ppm, while the figures for samples from other sources were mostly under 30 ppm. Similarly, 55Mn, 66Zn 95Mo, and 205Tl contents in Brazilian soybeans might be at least 1.5 to 10 times higher than in other samples. Considerably, though found at a low concentration, 163Dy could be found only in Brazilian soybeans (Table S1).

Although clustering at nearby positions on the PCA score plot, Canadian and US samples could be discriminated by certain elements on the basis of the loading plot and the moving charts, as shown in Figure 3(b) and Figures S3 and S4. While the highest content among all samples of 175Lu in the Canadian soybeans might be the key for identification of this group (Figure S3 and Table S1), 59Ti and 178Hf were the markers to distinguish soybeans from the US due to the higher content of the elements in this sample group than the other ones (Figure S4 and Table S1). The nearby positions of those two clusters could be explained by the similar content of 103Rh (Figure S5) in both the US and Canadian soybeans. While the element had the strongest negative effect on the PC2, these two groups also shared the negative PC2 values.

Overall, both HCA and PCA methods illustrated the clustering of soybean samples based on their different geographical origins. Samples from Vietnam could be distinguished from other imported groups by the higher contents of 78Se, 88Sr, 93Nb, and 137Ba, whereas Brazilian soybeans could be classified based on several key elements, such as 47Ti, 55Mn, 66Zn 95Mo, 163Dy, and 205Tl. Meanwhile, the discrimination of the US and Canadian soybeans depended on the typical contents of 175Lu for samples from Canadian or 59Ti and 178Hf for samples from the US. To the best of our knowledge, this is the first study that discriminates soybeans in the Vietnam food market using ICP-MS-based metallomics approach.

4. Conclusions

This study documented that classifying soybean from 4 countries according to their geographical origin gives further evidence of the ability of multivariate statistical analysis based on trace element data to show provenance. The elemental contents of soybean from Vietnam were specific enough to be distinguished from imported types; meanwhile, the samples from Brazil, Canada, and the USA could be classified clearly. Therefore, the developed method for the determination of 33 elements by ICP-MS could be used for identifying the authenticity of soybeans according to geographical origin growing in Vietnam, as well as imported samples from other countries. It could be considered as a promising, rapid, and cost-effective method to evaluate soybean and other food origins.

Data Availability

The majority of the data used in this study are included in the article. Other data can be made available upon request from the corresponding author.

Conflicts of Interest

All authors declare no conflicts of interest.

Acknowledgments

This research was funded by the Vietnam Academy of Science and Technology, under grant numbers TĐNDTP.01/19-21 and QTCZ01.01/20-21. The Center for Research and Technology Transfer, Vietnam, is appreciated for partly supporting this research. The authors thank Do Hoang Giang, Tran Ha Minh Duc, and Nguyen Tien Dat for the helpful advice on this research.

Supplementary Materials

Figures S1–S5 and Table S1 are provided. (Supplementary Materials)