Abstract

With the country’s policy support and the rapid development of Internet technology, the domestic consumption level has been escalating and the consumption structure has changed. The traditional retail industry cannot integrate all the relevant data due to data security and privacy protection concerns so that it is unable to adjust sales strategies in an accurate and timely manner. New retail has sounded the clarion call for the retail revolution. The supply chain demand forecasting is an important problem for the supply chain management. In this research, we propose a new retail supply chain commodity demand forecasting framework based on vertical federal learning, which solves the problems of data security and privacy faced by new retail theoretically and empirically. In experiments, we use datasets from different platforms (such as social platforms, e-commerce platforms, and retailers) in the same region for federated learning. The experiment results demonstrate the superiority of the proposed algorithm.

1. Introduction

In the context of rapid development of artificial intelligence and Internet technology, new retail [1] has become a new trend for traditional retail transformation. Driven by 5G [2], artificial intelligence [3], big data [4], and other technologies and businesses can use artificial intelligence (AI) technology [5] to determine users’ interests and hobbies and use the results of predictive models to guide personalized product recommendations, commodity supply chain demand forecasting, and advertising placement. However, the simple integration of data by each enterprise is not in line with data security and user privacy protection. The protection of the sensitive data is usually in conflict with the marketing process.

In recent years, countries around the world have enacted appropriate laws on data security and private data issues for their citizens. The General Data Protection Regulation (GDPR) [6] promulgated by the European Union has came into effect on May, 25, 2018. The Personal Information Protection Law was officially implemented in China on November 1, 2021. Federated learning (FL) [79] is a distributed machine learning (ML) [10] framework. In federated learning, multiple clients collaborate to solve traditional distributed ML problems under the coordination of a central server, while ensuring that local private data are not leaked.

However, in the current retail collaboration, multiple parties work together based on the data level, using machine learning methods to train models after data aggregation, which are used to guide advertisement placement, personalized recommendations, supply chain [11] demand forecasting, etc. However, in this case, each party has access to the detailed data of the other parties. Obviously, simple data fusion does not meet the requirements of laws and regulations to ensure the security and privacy of data.

In this work, we propose a new supply chain commodity demand forecasting framework based on vertical federal learning, which solves the problems of data security and privacy faced by new retail theoretically and empirically. In the new retail industry, the data collected are mainly closely related to customers’ personal preferences, online consumption [12] records of goods, local consumption records, and relevant information of goods. These three kinds of data may come from different platforms or departments. For example, the social platform [13] has the user’s personalized characteristics, such as the user’s content, browsing records, and topic discussions. The e-commerce platform [14] has customers’ online consumption records, such as the type of goods purchased and consumption time. Retailers [15] have offline consumption records of users, such as the amount consumed by users and the type of goods purchased. By using the characteristics of FL [7], we can build the FL model for three parties without exporting company data so as to protect data privacy and data security.

Compared with traditional machine learning [1618] algorithms, the advantage of the vertical federation learning method is that it can use the long and short-term memory [1921] network to predict the future sales of goods in a certain area by integrating the data between social networks, e-commerce platforms, and retailers, on the basis of ensuring that the data are not leaked. The main contributions of our work are summarized as follows:(1)We propose a new retail supply chain commodity demand forecasting framework based on vertical federal learning, which solves the problems of data security and privacy faced by new retail theoretically and empirically(2)The forecasting framework provides an effective sales prediction method and outperforms other federated learning-based machine learning methods in terms of privacy protection of commercial data

The structure of this article is as follows: Section 2 introduces some background and related works. Section 3 introduces the proposed method. Section 4 shows the experimental verification results. Section 5 is the conclusion.

2.1. Demand Forecasting in the Supply Chain

The supply chain is a huge network chain composed of suppliers, manufacturers, warehousing [22], distribution centers [23], and retailers and consumers, including all links from raw material procurement [24], semifinished product manufacturing, finished product manufacturing, product transportation, and product sales. Supply chain management [2527] is a highly integrated management model that spans multiple companies and departments, taking information flow, capital flow, and logistics as clues, including raw material procurement, order processing, production planning, inventory management and transportation, and sales and other commodities in all aspects of production and sales.

With the development of economic [28] globalization and the continuous improvement of technological level, enterprises have also begun to compete in the enterprise [29] supply chain and are committed to satisfying customers and providing suitable products to the market with the lowest cost and shortest logistics time. The introduction of artificial intelligence technology [3032] has changed the organization, management, and tracking of the flow of goods and services in the supply chain. With the help of artificial intelligence technology, companies can develop more in-depth, predictive, and reliable observation methods to detect their business partners, and even allow competitors to merge into the same complex and huge supply chain. It enables enterprises to simplify activities in the process of supply chain management to achieve a more efficient and transparent cooperative relationship, and to better improve the effectiveness and accuracy of logistics decision-making.

Demand forecasting [33, 34] in the supply chain is of great significance to companies and users. Based on the results of the forecast, companies can adjust production capacity (such as seasonal labor), subcontract, build inventory, and postpone delivery to achieve supply management through short-term price discounts and promotion to achieve demand management. Inventory backlog or insufficient inventory is likely to cause cost loss to the enterprises. If the demand is accurately predicted, it can provide good guidelines for production and greatly reduce the problems of overproduction and underproduction. In addition, in logistics and supply chains, products are often divided into functional and innovative products, but these two types of products are not distinct. Artificial intelligence technology can cover and predict the needs of different types of products after applying small-scale feature recognition and pretraining models for different types of products. Improve the overall efficiency and interaction of the supply chain, and optimize the supply chain as a whole. For users, reduce the problem of customers having nowhere to buy goods, slow logistics or deterioration of goods due to excessive inventory or backlog. The potential needs of customers are foreseen and can be met in a timely manner. Short-term price discounts and promotions can stimulate customer consumption and fully stimulate customer purchases.

2.2. Federated Learning

Federated learning is a distributed machine learning framework with the privacy-preserving, secure encryption technique designed to allow decentralized participants to collaborate on model training for machine learning while ensuring that their own private data do not leave the local area. This paradigm of machine learning on edge devices has been widely studied since 2017. Currently, based on the distribution of the data feature space and the sample ID space, researchers have classified federated learning into three main categories: horizontal federated learning, vertical federated learning, and federated migration learning.

Horizontal federated learning is suitable for situations where the data features of participants overlap more and the sample ID overlaps less, for example, customer data from two banks in different regions. The horizontal row of the data matrix (or a table, such as an Excel table) represents a training sample, and the vertical column represents a data feature (or label). It is usually better to view the data in a table (for example, case data) and to use a row to represent a training sample since there may be a lot of data. As shown in Figure 1(a), multiline samples with the same characteristics of multiple participants are combined for federated learning; that is, the training data of each participant is divided horizontally, which is called horizontal federated learning.

Vertical federated learning is suitable for situations since there is more overlap in training sample IDs and less overlap in data features, for example, the common customer data of banks and e-commerce companies in the same area. As shown in Figure 1(b), different data characteristics of a common sample of multiple participants are combined for federated learning; that is, the training data of each participant are vertically divided, which is called vertical federated learning. Vertical federated learning conducts sample alignment in advance, that is, to find out the samples that participants have in common. It only makes sense to combine different characteristics of a common sample of multiple participants for vertical federated learning. Vertical federation increases the feature dimensions of training samples.

Federated transfer learning uses transfer learning to use data from different sources and different characteristics for joint training, while realizing data privacy protection without causing serious accuracy loss. As shown in Figure 1(c), when the overlap of training data features and data samples between different participants is relatively small, we can cooperate through federated transfer learning. There are three main types of federated transfer learning: case-based, feature-based, and model-based types.

3. Method

In this section, we first formalize the federated learning (FL) problem, and then we present a new retail supply chain commodity demand forecasting method based on vertical federated learning.

3.1. Problem Formulation

In the retail industry, most of the data collected are related to customer purchasing power, personal preferences, and product-related information. In practical applications, these three data characteristics are usually split into three different departments or companies. Suppose there are N companies, and each company has a private dataset that can only be stored and processed locally. The goal of federated learning is to collaboratively learn a global model from scattered datasets. Taking safety linear regression as an example, the objective function of vertical federated learning can be defined as

In equation (1), refers to the regularization parameter, and refer to the local data set of party A and party C, is the local dataset and labels if party B, , , and refers to the local model parameters of party A, party B, and party C, respectively.

3.2. LSTM Network

Hochreiter et al. proposed LSTM in 1997 and was recently improved and promoted by Alex Graves. The motivation of LSTM is to solve the long-term dependency problem. The output of the traditional RNN node is only determined by the weights, bias, and activation function. RNN is a chain structure, and each time slice uses the same parameters. The reason why LSTM can solve the long-term dependence of RNN is because LSTM introduces a gate mechanism to control the circulation and loss of features. Due to its unique design structure, LSTM is suitable for processing and predicting important events with long intervals and delays in time series.

The input vector of a standard of RNN Network is . The RNN network uses (2) and (3) to solve the hidden vector and the output vector :

Among them, refer to the weight matrix of the input layer, hidden layer, and calculated output of hidden layer, respectively. and refer to all the bias vectors, and is usually set as a sigmoid function .

LSTM is one of the RNN architectures, and it is specially designed to avoid long-term dependency problems. As shown in Figure 2, its input gate determines the next input parameter, the forget gate loses some parameters, and the output gate outputs the required parameters, which makes the iteration effect better. LSTM essentially uses storage units and gate units to solve the problems of gradient disappearance and gradient explosion. Figure 1 shows the internal structure of the LSTM network unit. The results of the memory unit and gate unit are

The vectors of input gate, forget gate, and output gate at time correspond to , , and . The first step in LSTM is that the forget gate will read and to decide which information to discard. The input gate is to determine how much new information is added. The output gate outputs the part of the information that we determined to output.

3.3. Vertical Federated LSTM Model

Under the framework of new retail supply chain demand forecast, the data collected are mainly related to customers’ purchasing power, personal preferences, and product-related information. In fact, these three data functions may be separated between three different departments or companies. In order to encrypt and integrate the data from different departments, form a new dataset, and ensure that the data will not be leaked in the whole supply chain demand forecasting system, and we have designed the vertical federated LSTM model. In order to encrypt and integrate all the data on the three platforms to form a new dataset and to guarantee the security of the data on each platform, the following data processing framework is designed, the framework of which is given in Figure 3.

As shown in Figure 4, the vertical federated LSTM model specific implementation steps are as follows:(1)The central server sends the public key to the Fed-LSTM model and aligns the encrypted samples using the homomorphic encryption algorithm. The specific steps of homomorphic encryption are to encrypt each part of the data using the key generated by the algorithm and to decrypt the data after the model training is completed.(2)The encrypted three samples are input to the Fed-LSTM model for iterative training to obtain the local gradients.(3)The Fed-LSTM model sends the obtained gradients and losses to the central server, which decrypts the data using the private key.(4)After processing the decrypted data, the central server sends the processed data back to the Fed-LSTM model corresponding to each sample.(5)The Fed-LSTM model updates the parameters and repeats the above steps until a better joint model is generated.

4. Experiments and Discussion

4.1. Datasets

For the experiment, we use the dataset in the same area from different platforms (such as social network platform, e-commerce platform, and retailer platform). The objective is to test the relationship between customers and product sales to obtain future product sales forecasts. In this paper, we use this case to demonstrate the validity of our proposed model.

Social network dataset contains data on the use of social networks by users in the region, including information of their browsing history, participation in discussion topics, and personal preferences. E-commerce dataset is a dataset of an e-commerce platform. It contains the online consumption records of users in the region, such as the type of products purchased, the number of clicks on the products, the number of favorites, and other data. The price tags of different products are also provided. The retail trader dataset contains data on the types of products purchased by users, the retail price of the products, and the sales volume of the products.

Also, we evaluate the performance of our approach on Alibaba Cloud dataset, which focuses on making inventory management decisions based on the replenishment unit dimension (minimum inventory management unit), given historical demand data, current inventory data, replenishment duration, and relevant information of the replenishment unit (product dimension and geographic dimension) in the past period and combined with “time series prediction,” “operational optimization,” and other technologies. Under the premise of ensuring that the inventory probability meets the demand without interruption, reduce the inventory rate to reduce the inventory cost.

4.2. Evaluation Metrics

The MAE, RMSE, and SD of the results of each model are calculated with different hyperparameters for each model, and the line graphs of different models predicting future commodity sales are drawn in the same graph for comparison:

4.3. Experiment Set-Up

A two-layer LSTM network with the dropout layer is used to compare with a variety of typical machine learning algorithms (linear regression, XGboost, and LightGBM). All federated learning algorithms used in this paper are shown in Table 1.

The three datasets used in this paper are processed by a homomorphic encryption algorithm to avoid problems such as data leakage. After that, it is trained using the above-mentioned federated learning algorithm to obtain the corresponding training parameters, and finally, all the parameters are updated into a unified model.

4.4. Experiment Details

The experiments use product data from the region for product sales prediction. The data include a variety of features such as pricing, actual selling price, number of clicks, and number of favorites. In order to verify that the Fed-LSTM model has better prediction results, the MSE, RMSE, and SD of the predicted sales and the actual sales are compared with those of the other three methods. In order to show the advantages of the proposed Fed-LSTM model more clearly, the comparison of predicted and actual sales obtained by each method is shown in Figure 5.

The red part represents the real sales, the blue part represents the predicted sales, and the purple part where the prediction overlaps shows the accuracy of the method indirectly. It can be seen that Fed-LSTM has the highest prediction accuracy compared to other methods although its performance is not optimal during the peak sales period, while GBDT has better results in predicting only a part of the peak sales period. Among the four methods, Fed-LSTM has the best prediction performance. The evaluation index values of the four methods are shown in Table 2.

Table 3 demonstrates the performance of several frameworks on Alibaba Cloud dataset. Similar to the results shown in Table 2, Fed-LSTM achieves superior performance than LR, XGboost, and GBDT. On these time series datasets, the LSTM model is capable of capturing historical information and thus can outperform other machine learning algorithms.

5. Conclusion

In summary, we used long and short-term memory networks based on federated learning to predict the future sales of goods in a region, integrated data between social networks, e-commerce platforms, and retailers, encrypted and aggregated the whole data on the basis of ensuring data nondisclosure, and could finally update the optimal model parameters after the training of the model. Experiments show that Fed-LSTM provides an effective sales prediction method and outperforms other Federated Learning-based machine learning methods in terms of privacy protection of commercial data.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request and have been deposited in the GitHub repository https://github.com/yuxiaowww/BDCI-2018-Supply-Chain-Demand-Forecast.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 62003279, 61973249), Shaanxi Association for Science and Technology Young Talent Lifting Program (No. XXJS202242), Key R&D Programs of Shaanxi Province (No. 2021ZDLGY02-06). Qin Chuangyuan cited the High-Level Innovative and Entrepreneurial Talent Project (2021QCYRC4-49), National Defence Science and Technology Key Laboratory Fund Project (No. 6142101210202), Education Department of Shaanxi Province of China (No. I9JC041) (Qinchuangyuan Scientist+ Engineer) (No. 2022KXJ-169).