Abstract

Applications of Internet of Vehicles (IoV) make the life of human beings more intelligent and convenient. However, in the present, there are some problems in IoV, such as data silos and poor privacy preservation. To address the challenges in IoV, we propose a blockchain-based federated learning pool (BFLP) framework. BFLP allows the models to be trained without sharing raw data, and it can choose the most suitable federated learning method according to actual application scenarios. Considering the poor computing power of vehicle systems, we construct a lightweight encryption algorithm called CPC to protect privacy. To verify the proposed framework, we conducted experiments in obstacle-avoiding and traffic forecast scenarios. The results show that the proposed framework can effectively protect the user's privacy, and it is more stable and efficient compared with traditional machine learning technique. Also, we compare the CPC algorithm with other encryption algorithms. And the results show that its calculation cost is much lower compared to other symmetric encryption algorithms.

1. Introduction

Internet of Vehicles (IoV) is a new type of industry with deep integration of automobile, electronics, information communication, transportation, and traffic management. As a necessary technical means to realize the intelligent transportation system and automatic driving, IoV is the core technology to solve the current traffic problems. In IoV, data exchange between on-board unit (OBU), roadside unit (RSU), and mobile network enable information sharing between vehicles and all systems. For example, in vehicle-to-vehicle (V2V) communications [1], which is one of the IoT applications, the functions running on our sensor nodes can be part of the on-board system of each vehicle and the functions running on base stations can be running on roadside devices. In this way, the local transportation department can accurately and comprehensively grasp the real-time traffic information to conduct intelligent analysis and make decisions. And the current decision information is fed back to each vehicle for emergency warning of traffic incidents and path planning [2]. The development of IoV and service capability has spawned many applications, such as driving safety applications, traffic efficiency applications, and entertainment applications. At present, all countries in the world are actively researching the related technology of IoV and giving strong support to the development of it. In 2016, the U.S. Department of Transportation officially released Federal Motor Vehicle Safety Standard No. 150 (FMVSS No. 150), which mandatorily requires all light vehicles to install vehicle communication equipment to ensure the real-time transmission of safety information between vehicles. In 2018, the standardization of global unified vehicle networking communication standard LTE-V2X was completed, which is formulated by the international standard organization 3GPP. In 2020, China's national standard system for IoV is completed. With the application of IoV, vehicles can get more information than a single vehicle, which is of great significance to improve traffic efficiency, reduce the incidence of traffic accidents, and improve traffic management.

The transmission and sharing of data in IoV bring great value, but if the data are leaked in the process of transmission, storage, and sharing, it may cause serious risks to users and society. Compared with the server, the computing power of the on-board system is usually poor, so there may be loopholes in various security protection measures. Due to the working mechanism of IoV, vehicles need to exchange BSM messages with other vehicles and RSU regularly. BSM messages are generally sent in broadcast form without the process of encryption, which brings a privacy leakage risk to each node in IoV [3]. The collection and analysis of data can help the transportation department make better decisions, but it may cause privacy problems for users. Research has shown that activities in physical space and virtual space can influence one another [4]. There have been several IoV security incidents since 2015. The vehicles are invaded remotely by attackers because of the vulnerability of the BMW connected-drive system, putting more than 2 million vehicles at the security risk of being attacked by hackers. In 2017, Nissan announced that the database of its vehicle financing department was invaded, as many as 113 million customers’ personal information was hacked. In 2020, Tesla was exposed to iBeacon privacy leaks, which caused great distress to car owners' privacy.

To solve the dilemma of data silos and data privacy, Google and WeBank have proposed different federated learning (FL) algorithm frameworks. In federated learning, participants’ data are kept locally, no privacy is disclosed, and participants build the federated model and benefits together [5]. Compared with traditional machine learning, the nodes in federated learning are unstable and highly autonomous. Vehicles can move at high speed and frequently and drive in and out at any time, and the density of the vehicle varies greatly in different space-time scenarios. Therefore, we can apply federated learning to IoV. Through federated learning, each vehicle can jointly train the machine learning model without exchanging local data [6].

To solve a series of security problems in IoV environment, we propose a blockchain-based federated learning pool (BFLP) framework and integrate a lightweight encryption algorithm into it. In this framework, we combine blockchain and federated learning to protect user’s data privacy. Considering the poor computing power of the vehicle system, we use a lightweight encryption algorithm to encrypt the user’s data to ensure security when vehicles share information. At the same time, we construct a federated learning pool module (an adaptive learning model) so that the server can select the corresponding federated learning method automatically according to different distribution characteristics of the data source. Through BFLP, user’s data privacy can not only be better protected but also the model trained can be better. Our main contributions in this paper are as follows:(i)We proposed a federated learning pool (FLP) module to train the model, which can choose the most suitable federated learning method according to the present application scenarios.(ii)We construct a lightweight algorithm called CPC for the vehicle system to encrypt user’s data.(iii)We use the blockchain as the bottom layer of the framework to facilitate the sharing and storage of user’s information, ensure the reliability of data transmission, and protect user’s privacy.

The remainder of this paper is organized as follows. In Section 2, we will introduce our related work. In Section 3, we will introduce our privacy protection model in detail. In Section 4, we verify the proposed framework by experiments and discuss the experimental results and the security of our framework. Finally, Section 5 concludes this article and discusses future work.

2.1. Federated Learning

The great success of AlphaGo in 2016 shows us the great potential of artificial intelligence (AI). Although AI has high commercial value, the reality is disappointing. AI depends on the data of many fields to train, but in reality, due to industry competition, privacy security, and other issues, data exist in the form of data silos. On the contrary, data security and privacy have received unprecedented attention in recent years. China implemented the Network Security Law of the People's Republic of China in 2017, and the EU formally implemented the General Data Protection Column in 2018. To solve data silos and privacy protection, federated learning was created. Figure 1 shows a traditional FL model. In federated learning, each node jointly trains the model under the coordination of the server, and the training data are stored locally without sharing with others.

Compared with traditional machine learning, federated learning does not need to collect all data but has a similar modelling effect, which greatly reduces the privacy risk and cost of machine learning. Federated learning is applied in many fields with its privacy protection. In [7], Yan et al. applied federated learning to variant perceptual learning of multisource decentralized medical image data and constructed small medical image data sets of different institutions into large medical image data sets. Kang et al. integrated federated learning modules into mobile networks to enable mobile devices to train and share models without leaking their local data [8]. Niknam et al. applied federated learning to wireless communication to protect privacy [9].

For different data sets, Yang et al. [5] divided federated learning into horizontal federated learning, vertical federated learning, and federated transfer learning, which enriched the scope of federated learning. The characteristic of moving frequently of vehicle nodes makes it very suitable for federated learning, and many researchers have applied federated learning to IoV. Lu et al. applied federated learning to vehicle network physical systems to protect data privacy [10]. With the emergence of federated learning, the privacy of users can be further protected while machine learning. Pokhrel and Choi proposed an autonomous blockchain-based federated learning (BFL) design for sensing privacy and efficient vehicular network [11]. Elbir and Sinem applied federated learning to vehicle networks to develop an intelligent transportation system [12]. Federated learning can make individual training data stay locally while building a model together, which greatly reduces the possibility of user’s privacy leakage.

2.2. Homomorphic Encryption

Homomorphic encryption (HE) is a special encryption algorithm based on the computational complexity theory of mathematical problems, which allows the processing of encrypted data without the private key. In addition to the basic encryption operations, the biggest feature of homomorphic encryption is that it allows ciphertext to be calculated directly compared with the general encryption algorithm, and the results after decryption are the same as those directly calculated in the plaintext. Therefore, homomorphic encryption has a higher safety. As early as 1978, Ron Rivest and Leonard Adleman put forward the concept of homomorphic encryption. Homomorphic encryption can make the operation satisfy the additive homomorphism and multiplication homomorphism, and it is divided into somewhat homomorphic encryption and fully homomorphic encryption (FHE). The elliptic curvature cryptography (ECC) proposed by Neal Koblitz and Victor Miller in 1985 is a partial homomorphic encryption algorithm satisfying additive homomorphism. The famous RSA algorithm is also a partial homomorphic encryption algorithm satisfying multiplicative homomorphism. In 2009, Craig Gentry, a Ph.D. student from Stanford University, constructed a fully homomorphic encryption scheme based on the ideal lattice, which marks a breakthrough in homomorphic encryption technology.

Because of the characteristic of higher safety of homomorphic encryption, many scholars have applied it to the Internet of Things (IoT), biometric authentication, blockchain, and other fields for privacy protection. Zouari and Hamdi applied homomorphic encryption to IoT and realized the effective combination of information of multiple nodes safely [13]. Salem et al. applied homomorphic encryption to biometric identification, which improved the security of biometric authentication [14]. Besides, homomorphic encryption can also provide a method for privacy protection in blockchain, and She et al. integrate homomorphic encryption algorithm into blockchain and apply it to smart home systems to protect user’s privacy [15]. Liang proposed a circuit copyright protection blockchain based on homomorphic encryption, which can effectively solve the security problem of circuit copyright transaction [16].

Homomorphic encryption provides a new method for privacy protection [17]. It can separate the ownership and processing rights of data and directly calculate the ciphertext, and the original data of any participant will not be leaked. This makes the user’s privacy can be well protected.

2.3. Blockchain

Blockchain was originally revealed by researchers in 1991, aiming to add time stamps to digital documents so that they can be traced or cannot be tampered with. In 2009, Nakamoto applied blockchain to the management of the bitcoin financial system for the first time. The core of blockchain is distributed account, decentralized, smart contract, and consensus mechanism. Besides, the typical blockchain scheme also has the characteristics of a reliable database, detrust, trade quasi-anonymity, and open source programmable, so we can track data stored in blocks safely and transparently. In 2017, Wal-Mart, IBM, and JD.com launched a blockchain food safety project called Alliance in China. Maersk and IBM launched TradeLens to support information sharing and promote efficient and safe global trade. Also, blockchain can be used in electronic voting, copyright protection, and medical fields.

Blockchain has a higher safety, and it creatively uses hash calculation, proof of work, and distributed storage to make tamper with block almost impossible. At present, researchers have used blockchain to solve the problems of privacy protection in the Internet of Vehicles. In [18], Pokhrel and Choi applied blockchain and federated learning to autodriving to protect user’s privacy and used a blockchain incentive mechanism to reward the worker nodes performed better in federated learning to encourage them to participate in federated learning more actively. Das et al. applied blockchain to vehicle theft prevention to ensure vehicle safety and owner’s privacy through smart contracts [19]. In [20], Rawat applied blockchain to V2X communication to protect data privacy. Liu et al. proposed an electric vehicle power trading model based on blockchain and smart contracts [21].

The information records in blockchain have the characteristics of a lifelong responsibility system. Once completed, it is very difficult to tamper with and delete. Because of the deterrent force of this lifelong responsibility system, commercial cooperation, social behaviour, and credibility will be greatly improved. It will be of great help to increase the construction of our future credit system and even the progress of human civilization. Finally, blockchain technology ensures that all parties to the collaboration see the same information system, which laid a good foundation for building a wider range of social cooperation in the future.

3. Privacy Protection Model and Algorithm

In the current application of IoV, V-C2X and MEC technology make the vehicles, people, and roadside units share information. It realizes the interaction of real-time traffic information which greatly facilitates our lives, but the privacy protection in IoV is not paid enough attention at present. The emergence of federated learning can well solve the problem in the application of the Internet of Vehicles at the present stage. It not only solves the problem of data silos but also ensures the security of user's privacy. To solve the privacy problems in IoV, we propose a blockchain-based federated learning pool framework [2226].

As shown in Figure 2, our framework consists of the federated learning pool, adaptive learning models, and blockchain. We use blockchain as the bottom layer of the framework to ensure the reliability of data transmission. All parameters are uploaded to the server in the blockchain through base stations and RSU. Each adaptive learning model can automatically select the most suitable federated learning method according to the characteristics of the data source in the process of federated learning. And in V2V, vehicle nodes are equipped with CPC lightweight encryption algorithm to ensure the user's privacy and security while sharing information. Meanwhile, the use of homomorphic encryption algorithm in C2V is to further ensure security. In the end, the global model is built in the server of blockchain with multiple federated models.

3.1. Federated Learning Pool

Given the complexity of IoV, it is a very difficult problem to choose which federated learning method to use. However, if the server can independently analyse the distribution characteristics of data sources and automatically select the corresponding federated learning method, this problem will become very simple. Therefore, we propose the concept of federated learning pool, which is an adaptive federated learning module carried on servers and can select the most suitable federated learning method according to the characteristics of data sources. In FLP, according to practical application scenarios, some design criteria such as decentralizing or weak centralizing, energy-saving, and security are selected purposefully, and some strong or weak principles are selected, to improve the efficiency of the system [27, 28]. There are three federated learning methods (horizontal federated learning, vertical federated learning, and federated transfer learning) in our scheme, and the most suitable learning method will be chosen according to actual application scenarios, which protect the privacy of users.

3.1.1. Horizontal Federated Learning

In IoV, each vehicle is a collector of information, and all the real-time traffic information they collect is an indispensable part of building the global model. The information interaction between vehicles makes drivers get more information, but at the same time, user’s data privacy may be leaked. In this model, we make vehicles as clients and servers in the blockchain as server nodes. The vehicles download the parameters from the blockchain and conduct local model training with the local data. After training, the updated parameters encrypted by CPC lightweight algorithm are returned to the blockchain. In this progress, the base station and RSU will aggregate the parameters returned by vehicles to share the work of the server.

The objective function of finite sum in this method is denoted by , where and is the data parameter of vehicle.

We take ; that is, the loss of prediction on example made with the data parameter of vehicle. We assume there are K vehicle clients altogether, with the set of local data on vehicle client k, with . Then, we can rewrite the objective function as

From the above definition, we can know that the total loss function is the weighted average of the local loss of each vehicle client and the number of samples. And in each round of communication, there will be a batch of gradient calculation, through multiple rounds of efficient iterative calculation to build a better model. The method of each iteration is as follows (t represents the round of iteration).

Vehicle client performs as follows:

that is, the average gradient for the local data with the current model parameters .

The base station aggregates the gradient:where and represents the efficiency of machine learning.

Method of equivalent update:

The vehicle clients send parameters to the base station by several rounds of updating, and then, the base station computes weighted average of these parameters.

Algorithms 1 and 2 show the details of horizontal federated learning.

Input: update parameters encrypted by CPC
Output: update parameters of the vehicle Uvehicle
 1: Decrypt ;
 2: Initialize ;
 3: for each round t = 1, 2, ... do
 4:  e ⟵ max((Percentvehicle) · (K), 1);
 5:  //Percentvehicle is percentage of vehicles selected each round.
 6:  Ct ⟵ (random set of e vehicles);
 7:  for each vehicle k ∈ Ct in parallel do
 8:   ;
 9:  end for
 10:  ;
11end for
Input: DBvehicle
Output:
 1: ⟵(split DBvehicle into batches of size G); //G is local minibatch size of vehicle.
 2: for each local epoch i from 1 to Ivehicle is the number of local epochs.
 3:  for batch do
 4:    − ηl (; b);
 5:  end for
 6:  Use CPC to encrypt ;
 7:  return to the base station
 8: end for
3.1.2. Vertical Federated Learning

The driver assistance system in IoV can provide a range of support for drivers while driving, including lane keeping assistance, autoparking assistance, brake assist, and automatic driving. These systems improve the driving experience and even avoid disasters in some dangerous situations. However, most manufacturers’ driver assistance systems rely more on their sensors, cameras, and algorithms. Such a method can also train models, but the model lacks some other important data that is not perfect. Therefore, we apply vertical federated learning to some complex road conditions or accident-prone areas. The data collected by vehicle sensors in the same area are not the same as the data owned by the transportation department in characteristics and parameters. The vehicles and transportation departments use the data with different characteristics in the two data sets to train the local model and then send parameters to the base station and RSU server to conduct gradient polymerization. In the end, the updated parameters are sent to the server in the blockchain to build a federated model.

The data sets owned by vehicles and transportation department are different, and we assume the data owners were involved in building models as . And suppose the data owners collaboratively train a machine learning model based on data samples and the feature vector are distributed among parties, , where is the feature dimension of . Without loss of generality, we assume holds the data labels. And we denote the data set of each part as , for , , and , where denotes the set. Then, the vertical learning model can be described aswhere denotes the training parameters of the t th parties, , and denotes the loss function and regularizer, and is the threshold. In this condition, the loss function can be described as follows:

The objective is for to find its without sharing its data set or parameter to other parties.

Then, we describe the way the server updates parameters. If a minibatch of data is sampled, the stochastic gradient is given by

For the arbitrary loss function, let , and the collection of information needed to calculate the loss function is defined aswhere is a function summarizing the information required form data owner to .

Based on the descriptions above, the stochastic gradients can be computed by (7) as

In each iteration of the server, the following formula is used to update the characteristic dimension of the th data party (is the learning rate):

Because is the intermediate information obtained in the most recent synchronization, which may contain staled information so it may no longer be an unbiased estimate of the true partial gradient. For another thing, during the local updates, no interparty communication is required. In the same spirit, a sequential version of the algorithm allows two parties to update their local sequentially, while each update consists of local updates without interparty communication.

Algorithm 3 shows the details of vertical federated learning.

Input: η
Output: ρ1, ρ2, …, ρT
1: Party Pt(t = 1, 2, …, T) initialize ρ1, ρ2ρT;
 2: Exchange Pt(t = 1, 2, …, T);
 3: for each iteration x = 1, 2, …do
 4:  Randomly sample a minibatch
 5:  for each party t = 1, 2, …, T sequentially do
 6:   for each local iteration r = 1, 2, …, Q do
 7:    t computes using (9);
 8:    update ;
 9:   end for
 10:     Exchange Pt (t = 1, 2, …, T);
 11:    end for
 12: end for
3.1.3. Federated Transfer Learning

Federated transfer learning is suitable for learning different data sets from different feature spaces. It migrates the features of different feature spaces to the same potential representation and train models with the labels in the labelled data collected by different parties. The goal is to use federated transfer learning to solve the problem of the lack of data and labels while protecting privacy. It can not only be applied to two sample spaces but also two different data sets. The federated transfer learning selects overlapping data and then conducts sample alignment, which is helpful for the part with good labelled data to build an improved model. Based on this model, the other parties predict the lack of characteristics of the samples and modify the model.

In IoV, the distribution of the data collected by vehicles and the data owned by the transportation department well meets the above conditions. The data collected by vehicles are often road conditions in a certain region, and the data owned by the transportation department contain other data except that. If the leakage of data occurs in the process of interaction between the two parties of the training model, it will cause very serious problems. According to the distribution characteristics of vehicle data sets and the transportation department, we choose federated transfer learning to build a model and protect privacy.

Let us denote data source of vehicles V, data source of transportation T, servers of RSU Server-RSU, servers in the blockchain Server-BC. Firstly, the hidden neural networks of V and T are established by an end-to-end solution. We initialize the parameters to be passed in V and T and use a prediction function to mark the target domain and minimize the alignment loss between V and T. Thus, we can express the function of updating of Server-RSU. We can also calculate the update gradient passed to V and T in the diffusion of Server-RSU to update and optimize the existing data model of V and T. To meet the security and privacy requirements of federated transfer learning, it is required to ensure that original data of V and T are hidden. We use the fully homomorphic encryption method to protect the privacy. V and T will generate their public keys and encrypt the parameters needed to be passed in each data party. At the same time, the private key is used to decrypt locally to obtain training information which will be uploaded to Server-BC. The notations used in Algorithm 4 are listed in Table 1.

Input:, ST
Output: lServer−RSU
 1: V and T do
 2: Initialize required by ;
 3: Distribute the public keys τV and τT;
 4: Use fully homomorphic encryption to encrypt ;
 5: Send to Server-RSU;
 6: Server-RSU do:
 7: Distribute the public keys τRSU;
 8: while the model is not convergent do
 9:     Server-RSU Execute:
 10:  Decrypt ;
 11:  Calculate , ,T;
 12:  Use fully homomorphic encryption to encrypt ,T;
 13:  Send [] to V and send [∂T] to T;
 14:  Send to Server-BC;
 15:  V do:
 16:  Decrypt [∂V] locally;
 17:  Calculate γV;
 18:  Use fully homomorphic encryption to encrypt γV;
 19:  Send [γV] to Server-RSU;
 20:  T do:
 21:  Decrypt [∂T] locally;
 22:  Calculate γT;
 23:  Use fully homomorphic encryption to encrypt γT;
 24:  Send [γT] to Server-RSU;
 25: end while
3.2. Federated Learning Model

In this section, we will describe the details of our lightweight encryption algorithm and the adaptive learning models.

3.2.1. CPC Lightweight Encryption Algorithm

Although federated learning has brought many public values (such as protecting privacy and breaking data silos), it still has some malpractice. In federated learning, although the user’s data do not leave the device and only parameters of model and gradients returned by users are transmitted in the channel, these gradients almost carry all the information of the user's data, and we can infer the user's information through reverse engineering and other methods. In federated learning, both server and worker can repeatedly get the model parameters after each iteration, which means that it is easier to infer the user's data in federated learning. Therefore, based on the Feistel cipher structure [29], we propose a new lightweight encryption algorithm called CPC. The CPC lightweight encryption algorithm has fewer encryption rounds, simpler conversion, and more efficient replacement. Considering the poor performance of the vehicle system, to reduce the cost of encryption of vehicle nodes, the algorithm only performs eight rounds of encryption, in which the input is 256-bit plaintext and 256-bit master key. Figure 3 shows the process of our encryption algorithm. We divide the plaintext into four subblocks and use the master key to derive a subkey, and then, four new subblocks are output in each round of conversion.

(1) The Generation of the Master Key. To encrypt the message, each sensor node generates an encryption key called , which is shared it with the base station. In this section, we present the process of the generation of this key which will be used in the CPC encryption process. Table 2 shows the parameters used in the key generation process.

Before deployment, the base station predistributes a set of keys to each sensor node in the network. Our asymmetric keys are based on elliptic curve cryptography (ECC) algorithm. This cryptography has already been proved that it provides a more security of shared keys, which ensures an equivalent level, or even more secure, than other asymmetric systems. The base station begins to create an asymmetric key pair unique to each node in the network. For example, a node will have a pair of unique public-private keys . is selected in the interval , where m is the parameters of ECC. is considered as the private key of node . The public key of is obtained by the following scalar multiplication:where is the point on the curve.

Each sensor node stores its public and private keys, identity ID, and public keys of the base station: . The base station stores its public and private keys and all the public keys of different sensors in the network.

After deploying sensors, each node must create a master encryption key to encrypt messages in the encryption process. Our main encryption key consists of 256 bits, which contains two components (Figure 4). The first component is composed of the first component (128 bits) of the output of the ECC cipher algorithm. The second component is a 64-bit ID (node identifier). When these components are generated, are combined with ID to generate a 256-bit master encryption key through SM3.

After the initial deployment is completed, the identity authentication and key distribution stage are carried out. We intend to use the preassigned asymmetric key before the deployment of nodes to share the key between each sensor node and the base station. In the process of key generation, we use Diffie–Hellman mechanism [30]. The main idea is that each node in network can create a shared key by using the parameters of ECC without interacting with the base station, which means no message there is exchanged. The generation process of is as follows.

The base station (BS) calculates a temporary key :

Node calculates a temporary key:

According to the Diffie–Hellman mechanism,

Combine and ID and then hash the merged result by using SM3 (a hash function) to generate our encryption key (output 256 bits):

Finally, the node N sends its main encryption key to the base station safely with the message containing authentication:

After these processes, will be stored in the memory of the sensor node. With the length of 256 bits, an attacker cannot figure out and this key will keep secret during the encryption process.

(2) Generation of the Subkey. We use the master key to calculate eight subkeys , which is used for the i round encryption. In the generation process of subkey, we design some calculations to ensure that different subkeys have different characteristics when facing a particular attack. In the initialization phase, we split the 256-bit master key into 8 equal-length 32-bit blocks . The generation of each subkey is based on different master key blocks and different exclusive-OR functions, and the function is used to reverse the different positions of . The generation details of different subkeys are given in Algorithm 5.

1: Split the 256-bit main encryption key Kenc into 8 blocks of 32-bits: K1, K2,…, K8
2: The 8 subkeys (Key1, Key2, …, Key8) are computed as follows:
3:  Key1 := K1 [Inv (K2) Inv (K3)]
4:  Key2 := Key1 [Inv (K3) Inv (K4)]
5:  Key3 := Key2 [Inv (K4) Inv (K5)]
6:  Key4 := Key3 Key2 Key1
7:  Key5 := Key4 [Inv (K5) Inv (K6)]
8:  Key6 := Key5 [Inv (K6) Inv (K7)]
9:  Key7 := Key6 [Inv (K7) Inv (K8)]
10:  Key8 := Key7 Key6 Key5

(3) Progress of Encryption. The content of each round in the process of encryption is shown in Figure 5. Each round contains four different operations: , function , function , and permutation operation. 128-bit plaintext M is divided into four 32-bit blocks. Four new bit sequences are generated in each round and then begin the next round of encryption.

Algorithm 6 shows the details of encryption.

Input: M, Kenc
Output: C
 1: Divide M ⟶ M1, M2, M3,M4
2: Generate different Keyj form the main Key Kenc
 3: for each round j = 1 : 8 do
 4:     X := MjMj+1
 5:     Y := Mj+1Mj+3
 6:     Count1 := Count − Zero (X)
7:      W := f1 (Count1, X)
 8:      R := WY
 9:     Count2 := Count − Zero (R)
 10:       S := f2(Count2, R)
 11:       Mj+4 := S
 12:  Permutation: (Mj+1, Mj+2,Mj+3)
 13: end for
 14: C := M9||M10||M11||M12
 15: Output C

In the end, three blocks are replaced to update the order of the encrypted text blocks. Algorithm 7 shows the permutation function in the process of encryption.

 1: Let tr a temporary variable
 2: tr := Mj+2
 3: The permutation operations are the following:
 4:  Mj+2 := Mj+3
5:  Mj+3 := Mj+1
 6:  Mj+1 := tr

At the end of the eighth round of encryption, the generated ciphertext C is defined as follows:

(4) Progress of Decryption. When the base station receives the ciphertext with the authentication message sent by the nodes, separates it, and then starts the process of decryption, it will use the shared key to decrypt the ciphertext to obtain the plaintext. The process of decryption is shown in Figure 6.

In the Feistel cryptosystem, decryption is an inverse process of encryption. It uses the same functions and parameters in the process of encryption. There is no essential difference between encryption and decryption calculations, but the order of using the subkey sequence is on the contrary. We assume that the sequence of blocks in the process of encryption is known, and Figure 7 shows the main steps of each round in the decryption process.

The permutation function in the process of decryption is shown in Algorithm 8.

 1: Let dr a temporary variable
 2: dr := Cj+1
 3: The permutation operations are the following:
 4:  Cj+1 := Cj+3
5:  Cj+3 := Cj+2
 6:  Cj+2: = dr

The definition of plaintext is as follows:

The process of decryption is shown in Algorithm 9.

Input: C, Kenc
Output: M
 1: Receiving C
 2: Generate different the 8 subkey Keyj from the main Key Kenc
 3: Partition C into 4 segments: C9, C10, C11, C12
 4: for each round j = 8 : 1 do
 5:  Permutation: (Cj+!, Cj+2, Cj+3)
 6:  X := Cj+4Cj+1
 7:  Y := Mj+2Mj+3 Keyj
 8:  Count1 := Count − Zero (X)
 9:  W := f1 (Count1, X)
 10:  R : WY
 11:  Count2 := Count − Zero (R)
 12:  S := f2 (Count2, R)
 13:  Cj := S
 14:  i := i − 1
 15: end for
 16: M := C1||C2||C3||C4
 17: Output M
3.2.2. Adaptive Model

In our federated learning framework, there are three adaptive learning models, V2V, V2A, and C2A, according to the characteristics and scenarios of data sources. And servers in the 3 models are all the servers in the blockchain. In our models, adaptation is mainly reflected in two aspects. On the one hand, the system allows different federated learning methods to meet the application requirements of different scenarios. Users can assign the initial learning mechanism through some APIs. On the other hand, the system supports the upgrading of federated learning methods at any time to achieve a hot upgrading of federated learning mechanism:(1)V2VIn this model, we make vehicles as the worker nodes. At the same time, we apply the CPC lightweight encryption algorithm to the vehicle system.Step 1: vehicles download the parameters from the blockchain and conduct local data training.Step 2: when the training is completed, the gradient is encrypted through the CPC encryption algorithm and returned to the base station and RSU servers.Step 3: the base station and RSU servers conduct gradient aggregation and upload the updated parameters.Step 4: the servers in blockchain choose horizontal federated learning to build a model through the FLP module according to the distribution characteristics of data sources.(2)C2AIn this model, we make the motor corporation and transportation department as the worker nodes.Step 1: motor corporation and transportation department download the parameters from the blockchain and conduct local data training.Step 2: when the training is completed, the gradient is returned to the base station and RSU servers.Step 3: the base station and RSU servers conduct gradient aggregation and upload the updated parameters.Step 4: the servers in blockchain choose vertical federated learning to build a model through the FLP module according to the distribution characteristics of data sources.(3)V2AIn this model, we make vehicles and the transportation department as the worker nodes. At the same time, we apply the fully homomorphic encryption algorithm to protect privacy while the transportation department is sharing the data.Step 1: vehicles and transportation department download the parameters from the blockchain and conduct local data training.Step 2: when the training is completed, the gradient of the transportation department is encrypted by FHE and returned to the base station and RSU servers.Step 3: the base station and RSU servers conduct gradient of the transportation department and vehicles aggregation and upload the updated parameters.Step 4: the servers in blockchain choose federated transfer learning to build a model through the FLP module according to the distribution characteristics of data sources.

In the end, the global model is built in the blockchain with multiple federated models.

4. Experiment and Analysis

In this section, we will discuss the security of our model and analyse the results of simulated experiments. We tested and compared the performance efficiency and storage cost of the CPC lightweight encryption algorithm with other encryption algorithms and analysed the advantages of the FLP module compared to traditional distributed machine learning based on the logistic regression model in the simulated environment of IoV. We also used caliper to test the performance of our system. In the end, we analysed the security and privacy of our proposed framework from the perspective of potential attacks. The details are as follows.

In our experiments, we used the TOSSIM simulator which is used for wireless sensor networks (WSNs) [31] to build the vehicle sensor-base station model. To measure the execution time of CPC and other encryption algorithms, we tested a variety of encryption algorithms in the TinyOS [17] system through many experiments, including CPC, TEA [32], XTEA [33], and AES [34], which are all newly proposed algorithms and used for low-resource embedded devices or mobile devices. Novelan et al. gave the implementation of the encryption algorithm TEA. TEA adopts a simplified scrambling function and short data packet, which can greatly reduce the cost of the decryption and encryption. However, it presents a poor key agility (the amount of time from generating/importing a new key to starting encrypting is negligible) and increases the burden of the gateway node. XTEA is the latest variation in TEA, which is an encryption algorithm based on ECC with a 128-bit key, 64-bit block, and 64 rounds Feistel structure. With the help of ECC, XTEA greatly improves the lifetime network, but the memory size remains a big challenge for this algorithm. Compared with the former algorithms, AES reduces the storage of keys in node memory by using master keys in key establishment. But resistance to attack is low, since the master key can be compromised at any time, and the different keys established after deployment using this key can be compromised too.

It is extremely important for critical applications in sensor networks to have a security mechanism that ensures authentication and confidentiality. However, optimal security in this type of network is particularly difficult due to the limitation of node resources. Therefore, we compared it with the above algorithms by experiments for testing our algorithm. To make the comparison, two crucial parameters have been selected: performance efficiency and storage cost. Then, we used the spark analysis platform to simulate the IoV environment and made analysis and prediction of the data. The specific experimental environment is shown in Table 3.

Firstly, we measured the performance efficiency of CPC algorithm. Because the decryption time is the same as the encryption time, we only measured the execution time of encryption. We used the above encryption algorithms to encrypt 50 times each and recorded the execution time through different lengths of vehicle sensor data plaintext. Figure 8 shows the time of the four algorithms to encrypt different plaintexts.

As shown in Figure 8, the encryption time the CPC algorithm needs is shorter compared with other tested algorithms. We can conclude that the execution time provided by CPC is much faster than other algorithms. This is due to the use of simple mathematical XOR, scrambling functions, and a less encryption round of CPC to be completed.

Then, we measured the storage cost of CPC algorithm. Figure 9 presents the result of the comparison in terms of occupying memory space by the four algorithms. During simulation, only the encryption times were measured.

As shown in Figure 9, TEA algorithm uses the longest S-box and can encrypt data with maximum length of 64 bits, which is present more times than CPC. XTEA algorithm takes more times than CPC to execute the encryption process because it contains more rounds. So CPC has superiority over the three algorithms.

The information obtained by sensors has certain limitations because the actual driving of the vehicle is restricted to time and space. The application of federated learning effectively improves the robustness of the machine learning model in IoV. In the following experiments, we compared the model and algorithm proposed in this paper with the traditional distributed machine learning algorithm based on the logistic regression model. Based on the analysis of multiple orders of magnitude of IoV data sets, we predicted the road conditions of next week. Meanwhile, we tested the success rate of avoiding obstacles. The test results are shown in Figures 10 and 11.

As shown in Figures 10 and 11, in the test of predicting road conditions and avoiding obstacles, with the increase in the magnitude of the input data set, the success rates of the two algorithms are also improved. And with the same data scale, the success rates of two tests based on FLP are higher than the traditional distributed machine learning algorithm based on the logistic regression model, which means our model is better in terms of accuracy.

Then, we measured the average computation time of FLP and the traditional distributed machine learning model in two experiments mentioned. Figure 12 shows the results.

As shown in Figure 12, with the same data scale, FLP takes about 15% less time than the traditional distributed machine learning algorithm based on the logistic regression model, which means our model is better in terms of effectiveness.

Due to the dependence and mobility on massive data, the performance index of blockchain-driven IoV network is quite important, which includes latency, energy consumption, throughput, and scalability. In our experiment, we used the caliper to test the performance. Caliper is a blockchain performance testing framework that currently supports testing for processing traffic (TPS), latency, and resource utilization. After each round of test, users can obtain a series of test results and reports by caliper.

As shown in Figure 13, the throughout increased steadily with the increase in transaction times. It reached the peak when the transaction times reached 5000, the throughput is 296.4 TPS, and the average latency is 215.4 ms. Then, it began to decline slowly when the transactions times exceed 5000. At present, there is no national standard for blockchain performance indicators, and China Institute of Information and Communications are actively formulating it. According to the existing blockchain industry standards (Table 4), the performance of our system meets the requirements.

In the edge network, on the one hand, due to the limitation of calculation, bandwidth resources, and the distributed characteristics of network structure, it has always been the focus of current research to effectively mine and utilize the distributed data of multisource and heterogeneous in the network. On the other hand, data sharing faces serious privacy disclosure risk. Once the data provider shares the data, it will lose the control of the data and face higher security risks. The application of licensed blockchain establishes a secure cooperation mechanism among distrusted parties. By embedding FLP into the consensus protocol process of licensed blockchain, some security risks can be alleviated:(i)Remove centralized trust entities: the traditional access control model based on RBAC [35] and ABAC [36] relies on a central server to complete the management and judgment of permissions, which is easy to be attacked by illegal users to make data and files leaked. Licensed blockchain replaces trusted centralized management servers and connects each participant through multiparty data retrieval. In the proposed blockchain-driven data sharing scheme, centralized trust entities are no longer needed. Our model adopts an access control policy based on smart contracts to ensure access effectiveness. The smart contracts are stored in the blockchain and cannot be changed once published, and they will start once the conditions are met. Therefore, we can ensure the transparency and nontampering of the access control policy by smart contracts, which reduces the risk of data leakage caused by centralized trust.(ii)Ensure the quality of shared data: in order to prevent dishonest data providers from sharing invalid data, the consensus mechanism based on federated learning pool will verify the quality of data models learned by other data providers, and only qualified data sets and models will be retained.(iii)Safe data management: only data retrieval information will be uploaded to the licensed blockchain, while real data are stored locally. Data owners can control their data permissions by changing retrieval information. Meanwhile, in our framework, the gradient parameters of the federated learning module cannot be modified once they are uploaded. Therefore, without the authorization of the administrator, an attacker cannot obtain plaintext and modify data. The global model parameters are stored in the distributed file system after being encrypted, and the summary information for each group of data is recorded on the block. And each block contains the time stamp and the hash value of the previous block, it ensures the nontampering of data. We also use the lightweight authentication and homomorphic encryption algorithm [37]. If users do not leak their private keys, even if the message returned from the server is intercepted by malicious users, the meanings cannot be inferred. It protects the user’s privacy effectively.

5. Conclusions

In this paper, we innovatively propose a blockchain-based federated learning pool framework for data silos and data privacy disclosure in IoV and design a lightweight encryption algorithm called CPC to combine with it. In our framework, the federated learning pool module can select the appropriate federated learning methods according to the distribution of data sources, which makes building the model more accurate and faster. Meanwhile, the application of the CPC lightweight encryption algorithm further ensures the security of data interaction between vehicles. Besides, we make the blockchain as the bottom layer to establish a trusted mechanism, which ensures the reliability of data transmission. Then, we made an experiment to verify our framework, and the results show our framework is more efficient, safe, and accurate.

At present, our framework is not perfect enough. In the future, we will improve the computational efficiency and accuracy of the federated learning pool and improve the federated learning methods in federated learning pool so that our model can adapt to more scenarios. The CPC lightweight encryption algorithm can reduce the workload of vehicle system. However, as the amount of data rise, the cost will also rise, so we will mainly improve our CPC lightweight encryption algorithm in energy consumption and storage cost. In addition, we will also improve the stability of blockchain to make BFLP more efficient and stable.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.