Abstract

In order to improve the risk identification ability of the technical support system of food safety supervision, an online screening platform for food risk substances (hereafter referred to as “platform”) was established. The platform aims at the qualitative analysis of unknown compounds and consists of three parts: a standard spectrum library, screening model, and online comparison module. The standard library contains the standard spectra of 527 food risk substances by high-performance liquid chromatography/high-resolution mass spectrometry. The screening comparison algorithm, the core of the screening model, is obtained through the improvement of the existing spectral library search algorithm. The inspector uploads the original spectrum file through the online comparison module; the online comparison module calls the corresponding script to convert the original spectrum file into a standard spectrum file and then uses the screening and comparison algorithm to achieve online real-time comparison. The comparison results are used to determine whether the sample to be tested contains the food risk substances contained in the standard library, so as to realize the preliminary screening of potential food risk substances. The platform supports the spectrogram data format of mainstream instrument manufacturers. The standard spectrogram database can be coconstructed and shared by cooperative laboratories to effectively enrich the types of food risk substances. Through laboratory comparison, data calibration, and model optimization, the screening accuracy of the platform can reach more than 97%. The platform adopts the Internet online screening method, which greatly facilitates the risk investigation and control of national food safety inspection and testing institutions. At the same time, the construction of the screening platform for food risk substances based on high-performance liquid chromatography/high-resolution mass spectrometry, the Internet, big data, and other technologies will provide a new technical means for food safety risk management and control. Hence, it can build a bridge between the screening of risk substances and illegally added substances, as well as risk assessment, risk management, and control.

1. Introduction

With the development of the market economy and the improvement of the country’s overall strength, China, the largest food producer and consumer since 2010 has a gradually increasing food quality. But because of the large amount of food consumption and the long food industrial chain, China has witnessed numerous food safety incidents, which have aroused widespread concern in society. The Chinese government has increased the monitoring of food risks through a series of policies and measures and has established a food safety risk management and control mechanism based on source control, process control, and end-product monitoring. In the mechanism, a sampling inspection and risk-screening system have been established at the technical level. This greatly improves the ability of food safety management and control and significantly improves food safety issues [1].

As the basic and supporting technology of food testing, instrumental analysis technology has developed rapidly in recent years. Liquid chromatography (LC) and gas chromatography (GC) have excellent performance in the separation of compounds. In view of the high selectivity and high sensitivity of mass spectrometry (MS) in the qualitative and quantitative analysis of trace substances, many countries rely on GC–MS and LC–MS [24] and other analytical techniques in the detection and screening of food risk substances. LC–MS technology has a wide range of analysis, and it can detect almost all compounds, thus solving the problem that GC cannot analyze thermally unstable compounds. It has a strong ability to separate substances, even if the analyzed mixture is not completely separated. It can also perform qualitative and quantitative analysis through characteristic ion mass chromatograms to obtain the structural information and molecular weight of each component. The detection sensitivity is high, and sample detection at the microgram level is possible. The analysis time is short, and the detection time of a single sample is generally less than 15 minutes, which can significantly shorten the analysis time [59]. When using the LC–MS technology to detect and screen food risk substances, in addition to relevant equipment for detection, it also needs to rely on professional screening software that includes compound standard MS databases of compounds and comparison algorithms [1012]. At present, most of the inspectors in various countries are limited to professional screening software provided by various instrument and equipment manufacturers when carrying out the screening and comparison of food risk substances. The standard MS database contained in this screening software is not only expensive but also unable to cover all of them. Screening procedures for risk substances are cumbersome, and there are various problems such as the high cost of manpower and material resources [13, 14]. In the context of the wide variety of substances at risk for food safety and the lack of professional network sharing databases, the establishment of a universal cross-instrument brand high-performance liquid chromatography/high-resolution mass spectrometry sharing screening software used for quickly screening for risk substances in food has become a major subject of research by food safety regulatory technical support institutions [1517].

In view of the technical bottlenecks encountered by food inspection agencies in the screening of food risk substances, the relevant team of the National Institutes for Food and Drug Control conducted extensive investigation and research and used integrated technologies such as high-performance liquid chromatography/high-resolution mass spectrometry, the Internet, and big data [18, 19]. It finally established a food risk substance screening platform for food inspectors across the country, which has been officially launched. The platform refers to the European Union’s analytical method guidelines [20], which aim to qualitatively analyze unknown compounds in mass spectrometry files from different instrument manufacturers. When carrying out the screening of food risk substances, the inspectors preprocess the relevant food samples according to the screening preprocessing technical standards researched and formulated by the National Institutes for Food and Drug Control. High-performance liquid chromatography/high-resolution mass spectrometry is then used to perform the detection. After testing, the generated data files are uploaded to the online comparison module of the screening platform through the Internet. The online comparison module calls the screening model for real-time analysis and comparison and then sends back the screening results to the inspectors. The inspectors refer to the screening results and combine other information to make comprehensive judgments to complete the preliminary screening of risky substances.

The platform can automatically identify the original mass spectrometry files of instruments from various brand manufacturers and perform a unified data format conversion; hence, there is no restriction on the brand and version of the instrument. The standard library of the platform can be jointly built and shared by cooperating laboratories, which can effectively enrich the types of food risk substances in the database and has good scalability. The screening model of the platform is based on the SS combination algorithm, and the algorithm has been optimized and improved through a large number of screening comparison experiments, which effectively guarantee the accuracy and scientific nature of the screening results given by the platform. The platform adopts the Internet online screening method, which is more efficient than the traditional risk-screening work mode and can greatly facilitate the risk investigation and control work of food safety inspection agencies.

2. Materials and Methods

The platform consists of three parts: a standard spectrum library, screening model, and online result comparison module. The standard spectrum library serves as the underlying basic database for risk screening. The screening model is used for screening and comparing the risk substances. The online result comparison module allows users to upload spectrometry files and obtain screening results in real time. Java language is used in the page development of the platform, and the mainstream technologies such as SpringBoot (https://spring.io/projects/spring-boot) and jQuery (https://jquery.com/) are applied. The underlying model is developed through Python, mainly using third-party libraries such as pymzML and Pandas [21].

2.1. Standard Spectrum Library

The platform builds a standard spectrum library based on high-resolution MS data for 527 banned and restricted compounds found in food matrixes [22, 23]. At present, the spectrum library mainly integrates the standard spectral data of Agilent brand instruments, which mainly covers the mass-to-charge ratio of the parent ion and the mass-to-charge ratio of the first 15 second-order fragment ions, as well as the corresponding relative peak intensity, retention time, and some basic information of the compounds. The content of the high-resolution spectrum library with methomyl used as an example is shown in Table 1.

2.2. Screening Model

The screening model is the core of the whole platform, and the screening comparison algorithm is the core of the screening model, which is obtained by improving the existing spectral library search algorithm, specifically, SS combination algorithm. The SS combination algorithm, proposed by Stein and Scott, includes the cosine similarity algorithm [24] (also called the weighted dot-product algorithm), represented here as SC (Uw, Vw), and the peak ratio algorithm, represented here as SD (Uw, Vw) [25, 26]. The calculation formula of the cosine similarity algorithm is expressed as follows:where V represents the compound in the library, U represents the unknown compound, ω is the mass-to-charge ratio and peak intensity information, and U and V are the matrix form of ω. ω is obtained by multiplying the mass-to-charge ratio and relative peak intensity of the compound by taking the exponent of a weighting factor. The calculation formula of ω is expressed as follows:where x = 1.3 and y = 0.53 are weighting factors. α and β refer to the mass-to-charge ratio and relative peak intensity, respectively. The calculation formula of the peak ratio algorithm is expressed as follows:where ui and are nonzero peaks with the same mass-to-charge ratio. When the peak value of the former is smaller than the latter, n = 1; otherwise, n = −1. Finally, the SC and SD are, respectively, multiplied by the corresponding weights and then combined to calculate the final similarity. The calculation formula is as follows:

Compared with the SS combination algorithm proposed by Stein and Scott, the improved combination algorithm has a larger difference in the strength of the same mass-to-charge ratio of the different spectra when the similarity of the mass spectra is low. In this case, the peak ratio calculation is preferred. When the degree of similarity is high, the number of the same mass-to-charge ratio increases, and the gap between the corresponding intensities of the same mass-to-charge ratio decreases. In this case, the cosine similarity calculation is preferred to further improve the similarity between the mass spectra. The premise of similarity calculation is to determine whether the parent ion is the same as the parent ion of the compounds in the standard spectral library. If the error of the parent ion is within 2 mDa, then it is considered the same. It is necessary to further compare the fragment ions and calculate the similarity and then combine with the relative retention time difference to select the best matching result with higher similarity and lower relative retention time difference. If considered as different, the mass spectrum is ruled out directly and no subsequent calculation would be performed.

2.3. Online Result Comparison Module

The online result comparison module is developed and constructed using web technology. The front end uses the components including jQuery, Echarts, ayUI, and JSmol, and the back end uses frameworks [27, 28] including SpringBoot, SpringMVC, SpringSecurity, and Mybatis (http://blog.mybatis.org/). The module includes the pages such as file uploading (shown in Figure 1), a summary of screening results (Figures 2 and 3), a detailed comparison of screening results (shown in Figures 46), and a basic information display of compounds (Figure 7). Its main function is to upload the mass spectrometry file to be screened, call the background screening model for comparison, and return the screening comparison results through the web page in real time. After the inspectors upload the file, the platform will call the data standardization software to convert the uploaded MS file into a standard format file in mzML format. The data standardization software ProteoWizard [29] supports data standardization for mass spectrometry files generated by mainstream mass spectrometer manufacturers [14]. Thus, the construction and application of the platform are not limited by specific brand instruments. After the spectrometry file conversion is completed, the system calls Python’s pymzML library to parse the mzML format file and reads the information of the parent ions and their corresponding fragment ions, such as peak intensity, retention time, and high-resolution accurate mass-to-charge ratio. It then calls the screening model to compare the unknown spectrum with the standard spectrum library. It should be noted that when preparing the data, the inspectors should preprocess the sample according to the specific standard procedures and confirm that the high-resolution LC/MS instrument used has been calibrated with good performance. They should also follow the recommended instrument method to collect data.

The online result comparison module realizes the interaction between the user and the server through the file stream and the data stream. The user uploads the test files through the file stream. Since most of the test files uploaded are large, the platform adopts Conris Ultra-High-Speed Transfer Protocol [30] instead of the traditional FTP transfer protocol in order to improve the upload speed and greatly improves the speed of file upload. After a series of operations such as file conversion, data analysis, result sorting, and result display, the screening platform renders the screening results via various graphics in the form of data flow on the basic information display page for users to read. On the basis of the screening results, the user can determine whether the test files contain risk substances and accordingly make the preliminary determination whether the tested food is qualified.

Each piece of information displayed on the screening results summary page includes the precursor ion, molecular formula, CAS number, and retention time of the unknown compound and the matched compound in the standard library. In the screening results, there may be a situation in which an unknown compound matches multiple compounds in the standard library. The inspector can preliminarily judge the most likely compound based on the matching score and the retention time difference between the unknown and the matched compounds. The detailed comparison page of the screening results displays the 2D bar chart of comparison and 3D bar chart of comparison of the unknown and the matched compounds. The inspector can visually observe the similarities and differences between the two. Through viewing diagrams of 2D and 3D molecular geometry and basic compound information (including the relevant physical and chemical properties of the matched compound and various information such as inspection standards and methods) of the matched compound, the inspector can have an intuitive and detailed understanding of the matched compound. According to the information displayed on the platform, the inspector can preliminarily judge whether the tested sample contains risky substances, which can guide subsequent experiments to obtain scientific judgment results more quickly.

3. Results and Discussion

3.1. Model Validation

The platform uses a series of comparison methods to evaluate the screening model and then optimizes and adjusts the model based on the evaluation results. The comparison method screens the test files to be screened via the platform and the professional screening software of the corresponding manufacturer, compares the screening results, and then calculates the accuracy of the screening model. The calculation formula is as follows:

After the first model was constructed, eight test files with high resolution were uploaded to the platform for comparison. The screening results revealed the following: first, there were false-negative results in the screening results, namely, the compounds contained in the test files were not included in the screening results and, second, the isomers were not completely distinguished.

3.2. Model Optimization

To solve these problems, the research team optimized the model according to three technical directions: first, the number of selected spectra was reduced. Because each test file contained thousands of spectra, the more the spectra were selected initially, the more the screening results were obtained later, and the more difficult it was to select the best-matched results. The efficiency of the screening model would be greatly reduced if all the spectra were analyzed. Therefore, measures were taken to reduce the number of spectra corresponding to each parent ion selected from the test files for optimization. Specifically, the total energy of the spectra was sorted, and the spectrum with higher energy was selected for analysis. Before the model optimization, 30 spectra at most could be selected for one parent ion, but now 20 spectra at most are selected for one parent ion. Second, we took into consideration the similarity and retention time difference (the difference between the retention time of the mass spectrum and that of the compared compound in the standard spectrum library) to optimize the model to avoid the deviation of a single factor. Third, we increased the matching number of secondary fragment ions. According to the EU analytical method guidelines, if two compounds have the same precursor ion and have at least one same secondary fragment ion, then it can be determined that the two compounds are most likely to be the same compound. However, the limited number of the same fragment ions can affect the accuracy of model screening, and some isomers can produce the same fragment ions [31, 32]. The isomers can be distinguished effectively by taking the method that at least two secondary fragment ions are the same under the premise of the same parent ion.

3.3. Model Revalidation

After the team optimized the model, they verified the screening model again. They uploaded the previous eight high-resolution test files for screening comparison. Screening results show that the proportion of compounds successfully identified by the model increased to 97.29%. The comparison of the two screening results is shown in Table 2.

4. Conclusion

The platform established in this paper has become stabilized after several times of model optimizing and testing. Currently, the first phase of the platform construction has been basically completed, and the platform has entered into small-scale trials. The present trials show that, by using this database, more than 300 banned and restricted compounds have been discovered in the actual food samples of daily monitoring and inspection. The platform has shown higher screening and identification for unknown compounds. It will continue to increase the standard spectrum library data of compounds; further expand the scope of screening; and continue to promote coconstruction, sharing, and verification through cooperative laboratories.

The construction of a food risk substance screening platform based on high-performance liquid chromatography/high-resolution mass spectrometry, the Internet, big data, and other technologies provides a new technical means for food safety risk management and control. It also builds a bridge between screening and risk assessment of risk substances and illegally added substances. It facilitates the full-chain online risk screening of food production and circulation, and it provides solid technical support for the intelligent supervision and inspection of food safety. It is reasonable to expect that this technology platform has a wider application prospect.

It is a new exploration to combine computer technology and spectrogram technology to create an online spectrogram real-time screening and comparison platform that is not subject to the limit of the instrument brand. It can be carried out not only in the food industry but also in various industries such as cosmetics, chemical industry, and environment industry to establish online spectrogram screening and comparison systems for all related industries to serve the industry risk management and control.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Special Project for Science and Technology of Food Safety (Grant no. 2017YFC1601300).