Abstract

At present, the inspection mode of China's import ports is generally manual based on experience, or random inspection by the document review system according to a preset random inspection ratio. In order to improve the detection rate of unqualified goods and realize the best allocation of limited human and material resources of inspection and quarantine institutions, a method composed of fuzzy reasoning, deep neural network, and factorization machine (DeepFM) was proposed for the intelligent evaluation of risk sources of imported goods. Fuzzy reasoning is used to realize the fuzzy normalization of the dataset samples, the DeepFM deep neural network is finally used for training and learning to classify and evaluate the risks of goods. Results of experimental tests on a specific customs import and export dataset verify the effectiveness of the proposed research method.

1. Introduction

With the rapid development of international trade and logistics, imported goods have higher requirements for port customs clearance [1, 2]. The sharp increase in traded goods poses a challenge to the random inspection of inbound and outbound goods by Chinese customs [3]. At the current stage, China’s customs department implements inspection and quarantine supervision on imported and exported goods, which are different for different types of goods. Various random inspection ratios are manually set [4]. The system randomly selects the goods according to the set ratio and issues inspection instructions. The same types of goods are treated equally and inspected randomly and proportionally, which is artificial and has certain unreasonable factors. At the same time, with the rapid improvement of global computer intelligence and the gradual rise of labor costs, companies are facing the need to use computers to replace humans to make intelligent decisions [5, 6]. This is forcing customs agencies to adopt new intelligent solutions to solve the existing problem of unfair random inspections [7,  8]. Therefore, exploring the establishment of a scientific, rigorous, convenient, and efficient port inspection mode is the key to solving the bottleneck problems that plague inspection and quarantine clearance of imported commodities [9, 10].

Because different cross-border mail items, different countries of origin, and different senders and recipients have different levels of credit, inspections can be based on multidimensional risk source databases, such as historical inspection and risk warning data, which are divided into more high, high, medium, low, and more low risk levels. Among these, according to investigations conducted by customs departments, products with more high and relatively high risk levels account for only a small part of the total volume of imported and exported goods, and most of their products have medium, low, and more low risk levels. Therefore, the existing uniform random deployment and random inspection methods cause high-risk biochemical products to flow into the market, and a higher inspection rate of source-of-risk products cannot be achieved.

Risk assessment, risk analysis on each piece of cargo in a batch of cargo, and risk-level evaluation are carried out, and the cargo risk is ranked from high to low [11]. Those with the highest risk will be inspected first, and then those with higher risks will be inspected until the total quantity of cargo to be inspected is met. Through this method of inspection, the classification management of the inspection objects is realized, key points of inspection highlighted, and seizure rate increased while the inspection rate remains unchanged [12, 13]. This also rationally allocates the limited manpower and material resources of agencies such as China’s Inspection and Quarantine Bureau.

Using historical declaration and on-site updated actual test inspection data to analyze the risk of import and export goods, resources can be more rationally allocated to import and export goods that must be inspected so as to achieve optimal resource allocation. Cao [14] started from the vagueness and highly nonlinear characteristics of the inspection and quarantine risk assessment itself; used nonstatistical methods for the first time to construct a scientific, reasonable, comprehensive, and objective indicator system; and employed artificial intelligence algorithms as the basis to further propose the use of fuzzy reasoning. A neural network algorithm is used to conduct a risk assessment on specific imported goods, so as to scientifically determine the inspection rate. In response to the time-consuming and low-efficiency problems of traditional mine-safety-accident data classification, Liu [15] proposed a classification method based on the combination of a long- and short-term memory (LSTM) network and attention mechanism and applied it to the classification of mine-accident levels. The results show that the proposed method improves the accuracy of the classification and achieves good results. On the basis of establishing the export risk assessment system of the sporting goods manufacturing industry, Cai comprehensively used the efficiency coefficient method and principal component analysis method to realize the export risk assessment of the sporting goods manufacturing industry and determined the better effect [16].

In the present research, a multidimensional intelligent comprehensive risk analysis of cross-border mailings was conducted, organically combining traditional fuzzy theory and deep learning, called DeepFM. Moreover, a risk-assessment method for imported goods based on the combination of fuzzy reasoning and DeepFM is proposed. Based on preprocessing, a large amount of historical declaration and inspection data of imported goods, in this method, key field information is first selected as the characteristic index of cargo risk evaluation according to expert experience. Fuzzy theory is then used to construct a fuzzy inference model of the risk-evaluation characteristic index. To realize the fuzzy normalization of the dataset samples, the DeepFM deep neural network is finally used for training and learning to classify and evaluate the risks of goods and to realize the intelligent comprehensive evaluation system for the risk sources of imported goods. Results of experimental tests on a specific customs import and export dataset verify the effectiveness of the proposed research method.

2.1. Fuzzy Reasoning

A fuzzy reasoning system, also called a fuzzy system, is a system based on fuzzy-set theory and a fuzzy reasoning method, which has the ability to process fuzzy information [17]. The fuzzy reasoning system uses fuzzy logic theory as the main calculation tool, which can realize complex nonlinear mapping relations, and its input and output are accurate values, so it has been widely used.

In using fuzzy theory as a solution, first, the risk indicators and proportion of unqualified levels are counted, the membership function selected, the corresponding inference rules formulated, and the fuzzy controller design completed. All of the statistical values are input into the fuzzy controller, and then the corresponding level of risk is inferred.

From a functional point of view, a fuzzy reasoning system must include the following steps [18].Step 1. Complete the conversion from the accurate amount to the fuzzy amount.Step 2. Inference fuzzy rules are the core of fuzzy controller design; that is, fuzzy rules are formed through fuzzy conditional sentences.Step 3. Implement defuzzification; that is, change the output data from a fuzzy amount to a precise amount.

Fuzzy conditional statements are usually expressed as follows:(1)“If A, then B”(2)“If A, then B; else C”(3)“If A and B, then C”

2.2. DeepFM Model

Deep learning can ensure effective information extraction and feature expression, as well as the completion of tasks like image recognition, time-series prediction, and text prediction. Typical deep-learning networks include, e.g., convolutional neural networks (CNNs) [19], deep belief networks (DBNs) [20], and recurrent neural networks (RNNs) [21]. The DeepFM model, like the wide and deep model, is also a deep-learning model that is widely used in CTR (click-through-rate) prediction [22]. It integrates the embedded FM model framework and DNN-based neural network framework in parallel, which capture low-order and high-order feature combinations, respectively. When the data matrix is sparse and has a large dimension, such as users with special preferences or numerous technical service features, the dense embedding of the factorization machine (FM) part can make the embedding vectors correspond to each nonzero feature, increasing the generalization ability of the model [23]. The multilayer perceptron in the DNN part resolves the problem that the FM part cannot perform high-level feature mining so that the model has a certain generalization ability for high-level query feature items that have not appeared [24].

2.2.1. FM Model

The idea of the FM model is to additionally consider the relationship between any two features after the LR (Logistic Regression) model. The preliminary definition of the model equation iswhere is the cross-weight of feature combination XiXj. Although compared to the LR model, the above model introduces a second-order feature combination, since XiXj = 0,  = 0, so in the case of large-scale sparse features, will eventually become 0. There is no difference between this model and LR. To deal with the case of sparse features, the following improvements were made in the FM model:And is the dot product of two vectors of size k:

Through the above-mentioned improved FM model, all first- and second-order feature combinations can be fully captured, and the model cannot be trained in the case of sparse data. The following ones are defined.(1) is the global deviation value, and R.(2) is the weight of the ith variable Xi, and  ∈ Rn.(3) is the embedding vector that must be learned corresponding to the ith variable Xi, and  ∈ R1 × k; k is the length of the vector, which is an important hyperparameter that represents the dimension of the decomposition and also reflects the complexity of the FM model.(4)is used to replace the original to represent the weight of the second-order feature combination XiXj. This is the most significant difference between the FM model and polynomial regression. As long as the combination of feature Xi and any other feature has appeared (that is, XiXj ≠ 0), features Xi and Xj can be extracted into the corresponding embedding vector by training. Therefore, in the prediction, although the combination of Xi and Xk in the training data has never appeared, the inner product of their corresponding embedding vectors can be calculated as the weight.

2.2.2. DNN

The DNN part is a multilayered feedforward neural network used to learn high-order feature combination information. However, because the original input features in CTR estimation are highly sparse, ultra-high-dimensional, type-mixed (continuous and discrete), and grouped by fields, it is necessary to add an embedding layer before the ordinary forward neural network. Each feature group with different input lengths is compressed to make it a low-dimensional, dense, and fixed-length vector of k. Another advantage of using the embedding layer is that the DNN part can share the embedding vector with the FM part, eliminating the trouble of training the embedding layer separately. Supposing that the output of the embedding layer is is then the input of the feedforward neural network, m is the size of the eigenvector of all features after passing through the embedding layer, and e1 is the value of the first dimension of the eigenvector.

The forward-propagation process is as follows:

Assuming that the number of hidden layers of the feedforward neural network is H, the final output of the DNN part is

3. Overall Model and Implementation Process

The entry-exit cargo system is a first-line business system that involves import and export trade. Cargo risk assessment is mainly conducted to predict the risk level of the cargo through historical data, so as to make a decision on whether the cargo must be inspected, which is an intelligent sampling inspection program.

In this study, the risk index system for imported goods is used as the evaluation benchmark, and a risk model for inspection and quarantine of import and export goods is proposed. The model is shown in Figure 1. The model mainly includes the following three modules.Step 1. Establish an index system for risk evaluation of imported goods. Using the expert investigation method, based on the experience of experts, four key fields of information, that is, cargo tax number code, company code, country of departure or destination, and country of transit, are selected from the customs import vehicle cargo database as the characteristic indicators for the evaluation of the risk source of the goods. This is used as the risk-indicator system of the cargo-inspection system.Step 2. Considering that the cargo risk-characteristic indicators are all text languages, the fuzzy theory is used to construct a fuzzy reasoning model of risk-evaluation characteristic indicators, and the fuzzy reasoning method is used to process the cargo risk-characteristic indicators in the system to realize the fuzzy normalization of the dataset samples. A database of risk index characteristics of imported goods is established.Step 3. Considering the uncertainty of the influence weights of the four risk-characteristic indicators on different goods, for this reason, the combination of DeepFM with deep learning is used to train and learn the characteristic database to realize the classification and evaluation of cargo risks and establish the risk of imported goods. An intelligent comprehensive evaluation system is conducted.

3.1. Fuzzy Normalization of Risk Index Data Based on Fuzzy Inference

The risk assessment of cross-border mail parcels is essentially the process of using neural networks to identify the risks of cross-border mailings. Risk-assessment indicators such as country of origin, inspection unit, consignee, and code are all textual descriptions and linguistic variables; their risk-level descriptions are also vague linguistic variables, such as high risk in Poland and low risk in the United States. In addition, the risk of the risk index, as the input value of the neural network, must be normalized. In other words, a model must be established that can map the risk value of the indicator from a high risk to a certain value. This is essentially a quantitative analysis of a fuzzy concept. Therefore, in this study, fuzzy reasoning is used to establish a fuzzy evaluation model of the index risk value.

At the same time, since the input samples required by the neural network are usually data between (0,1), the data collected according to the risk indicator system determined in the preceding subsection cannot be directly used in the neural network.

Here, the indicator “country of origin or destination” is taken as an example to introduce the fuzzy normalization of the data. Fuzzy theory is used as a solution. First, the risk indicators and proportion of unqualified levels are counted, the membership function is selected, the corresponding inference rules are formulated, the fuzzy controller design is completed, all the statistical values are input into the fuzzy controller, and then the corresponding level of risk is inferred.

According to the index system determined by expert experience, follow-up algorithm research and simulation are carried out. For the database of a specific customs agency in China, fields such as tax number code, Chinese tax number, company code, shipping country, transit country, and qualification status are selected. The qualified data is marked as 0 and the unqualified data as 1. The data are shown in Table 1, which is referred to as the original data in this study.

“Departure country or destination country” is chosen as an example for a detailed introduction of the study. First, the “country of origin or destination” is selected for cross-border mailings within a period of time; the “nonconforming ratio,” the cross-border mailings “total batch,” and two fields are set as the input of the fuzzy model, as well as the cross-border mailings. The integrity of the “country of origin or destination” is used as the output of the fuzzy model. This part of the study uses the Python programming language to establish a fuzzy reasoning system to achieve the fuzzy normalization of sample data.

The detailed steps are the following:(1)Delimit the corresponding fuzzy domain; that is, form a fuzzy set and convert the input digital information into language variables: the proportion of unqualified countries in the country of departure or destination can equal {very low, low, high, high}. Total batches in the country of departure or destination can equal {less, medium, more}. Integrity of the country of departure or destination can equal {A, B, C, D, E}.(2)Select the membership function to make the input fuzzy. After many tests in this subject, the trapezoidal membership function was selected, and the membership function expressions for the total batch of input variables and the proportion of unqualified results were configured, as shown in Figures 2 and 3.(3)Input the fuzzy reasoning rules that are set into the fuzzy rule editor in Python, as shown in Table 2. For example, when the total number of batches is large and the proportion of unqualified data is low, the output credibility is B.(4)Select the trapezoidal output membership function (Trapmf) to make the output level fuzzy according to five risk levels (A, B, C, D, and E), as shown in Figure 4.(5)After fuzzy normalization, sample data that can be applied to deep-learning training modeling can be obtained. Table 3 is the data table after fuzzy normalization of the original data in Table 2.

3.2. Implementation Process of Risk-Assessment Model for Import and Export Goods Based on Fuzzy Reasoning and DeepFM

Figure 5 describes the risk-source evaluation model based on fuzzy reasoning and DeepFM. The specific steps are the following:(1)Obtain the information of import and export goods (such as company code, shipping country, and destination country) and normalize the information to obtain a value of 0–1.(2)The fuzzy normalized value is used as training data, and the random inspection results (qualified or unqualified) of import and export goods are used as the target value. Through fast and effective model parameter-optimization methods (such as stochastic gradient descent and Adam) based on a gradient descent method, train a DeepFM model that can accurately evaluate the risk value of import and export goods, and obtain the evaluation list.(3)According to the size of the evaluation list value, divide the risk levels.

4. Experimental Tests

4.1. Dataset

To facilitate subsequent algorithm research and simulation, the sample data were processed in this study based on the index system determined by expert experience, and 11,871 batches of data in a certain period of time were collected as subsequent experimental data. The dataset contains 10,000 batches of “qualified” data and 1,871 batches of “unqualified” data. The qualified data are marked 0 and the unqualified data 1. The data are shown in Table 2, which is referred to as the original data in this study.

Two inspection algorithms, (i) fuzzy reasoning [25] and (ii) efficiency coefficient method and principal component analysis (EC-PCA) [16], are selected and compared with fuzzy reasoning and the DeepFM model proposed in this research. The fuzzy reasoning algorithm is used to perform a weighted average of four key indicators to directly obtain the risk value of each piece of cargo after fuzzy normalization of the risk-evaluation index data, and the high-risk value is regarded as an indicator of high-risk cargo.

4.2. Experimental Results and Analysis

Using the inspection data from a certain period, 5000 pieces were selected as the test set and four inspections were carried out: random control and algorithm-based control, including fuzzy reasoning, EC-PCA, and fuzzy reasoning-DeepFM. The four inspection methods were compared using the inspection-detection rate index. For the test set of 5,000 pieces, the unqualified rate was about 8%. Five hundred pieces were randomly inspected and 43 pieces were found to be unqualified, for an unqualified rate of 8.6%. For algorithm control, 500 pieces with the highest risk value were selected by the risk-evaluation algorithm, among which 163 pieces of unqualified batches were found by the fuzzy reasoning algorithm, for a detection rate of 32.6%. One hundred and seventy-three pieces of unqualified batches were found by the EC-PCA algorithm, with a detection rate of 34.6%. One hundred and eighty-two pieces of unqualified batches were found by the fuzzy reasoning and DeepFM, for a detection rate of 36.4%. The results are shown in Table 4.

Table 4 shows that the detection rate of the Fuzzy reasoning-DeepFM model is approximately four times that of the random control. In practical work, the significance is that when the inspection and quarantine manpower are insufficient, the algorithm control method can ensure a higher detection rate. By comparison, it is found that the accuracies of the model are 11.7% and 5.2% higher than that of fuzzy reasoning and EC-PCA, respectively. Therefore, the model based on the combination of fuzzy reasoning and DeepFM proposed in this paper has been applied to the risk assessment of import and export goods and achieved good results, which verifies the feasibility of the method.

5. Conclusions

In this paper, the inspection rate of import and export goods is expounded on and the framework of a risk-evaluation system of import and export goods given. The risk-assessment method of import and export goods based on fuzzy reasoning and DeepFM is put forward; the fuzzy normalization of declaration and inspection data is realized; training, testing, and comparison with two other neural networks are conducted; and it is finally proved that the method based on fuzzy reasoning and DeepFM is a feasible solution for the risk assessment of cross-border postal parcels, compared with other neural networks. The scheme has high stability, precision, and accuracy and can realize the transformation from manual random control to scientific control.

Data Availability

The data are available from the corresponding author Yuanyuan Xu upon request via email ([email protected]).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the National Key R&D Program of China (2018YFC0809200).