Abstract

In recent years, advancements in electric vehicle (EV) technology and rising petrol prices have increased the demand for EVs and also made them important for the Smart Grid (SG) economy. During the high energy demand, Vehicle to Grid (V2G) comprises a notable feature that returns the stored energy back to the grid. However, due to dynamic nature of energy prices and EVs availability, determining the best charging and discharging strategy is quite difficult. The existing approaches need a model to predict the uncertainty and optimize the scheduling problem. Further, other issues like security, scalability, and real-time data accessibility of EVs energy trading (ET) data at low cost also exist. Though many solutions exist, they are not adequate to handle the aforementioned issues. This paper proposes a Secure V2G-Energy Trading (SV2G-ET) scheme using deep Reinforcement Learning (RL) and Ethereum Blockchain Technology (EBT). The proposed SV2G-ET scheme employs a deep Q-network for EVs scheduling for charging/discharging. SV2G-ET scheme uses InterPlanetary File System (IPFS) and smart contract (SC) for secure access of EV’s ET data in real time. The experimental results prove the efficacy of the proposed SV2G-ET scheme that leads to improved scalability, saving the EVs charging cost, low ET data storage cost, and increased EV owner’s profit.

1. Introduction

Electric vehicles (EVs) have become popular due to several environmental and economical advantages compared to fossil fuel cars such as reducing pollution, greenhouse gas emission, and operation cost [1]. In modern transportation systems, EVs are the preferred choice due to their energy and environmental sustainability, although they often fail to find proper charging stations due to the limited infrastructure of Smart Grid (SG) and unbalance demand of charging. SG offers bidirectional communications with distributed computers to increase the grid’s quality-of-service, efficiency, reliability, and stability. In SG, the availability of public charging infrastructure is critical to accommodate more renewable energy, lower carbon emissions, and reduce peak charging demands. So, the development of smart networks enables the mechanism of Vehicle-to-Grid (V2G), where energy flows from EVs to the grid, rather than Grid-to-Vehicle (G2V) mechanism.

The International Energy Agency reports that, by 2030, the number of EVs will reach 145 million, which will account for of the total global vehicle fleet. Therefore, the efficient management of the EVs charging/discharging to reduce cost has become challenging due to random EV’s arrival and departure times, randomness of the EVs’ utility, and dynamic price of energy. In V2G, EVs can be used for excess energy storage from the SG to balance energy demand, smooth demand peak, and eliminate distribution losses [1].

Various research work has been done in past years to reduce the charging operation cost of EVs [2]. Yao et al. [3] optimally solved the charging scheduling problem. Then, Wang et al. [4] presented an online charging algorithm to maximize the profit of charging stations with Demand Response (DR) mechanisms. Here, a scheduling mechanism has been proposed in order to efficiently manage day-ahead charging and discharging of EVs. Ortega-Vazquez et al. [5] proposed a vigorous optimization mechanism for EV charging integrated price uncertainty and battery degradation. In [6], a two-stage stochastic approach has been proposed by the authors for EV charging scheduling. Energy demand of EVs and energy prices are dynamic in nature; hence, real-time scheduling techniques have grabbed the attention using various Artificial Intelligence (AI) mechanisms, for instance, Machine Learning (ML) and Deep Learning (DL). Das et al. [7] designed an ML-based algorithm to learn the EV’s status and inform when the EVs should perform G2V/V2G using support vector machines (SVM). Donovan et al. [8] presented a prediction mechanism for the EV’s availability to perform energy trading (ET) and used the Gurobi optimization to minimize the consumer’s energy bill.

Several benefits of V2G include efficiency, peak load shaving, clean fuel, quite an operation, grid infrastructure reduction, etc.; at the same time, there are disadvantages like adverse battery effects, new infrastructure, new trading mechanism, and security from cyberattacks, replay attacks, data manipulation attacks, etc. For example, Bartłomiej et al. [1] presented the effectiveness of the V2G process using predictive model [9, 10]. Moreover, the various ML- and DL-based approaches exist, but they have various challenges; for example, ML/DL models are trained on the training data and then tested over the test dataset. Whenever unseen data pass to the ML/DL model, it does not work as per the need. To handle the aforesaid challenges, the emergent model-free mechanism, i.e., Reinforcement Learning (RL), is a viable solution [11].

Recently, the model-free approach has proven its effectiveness in various decision-making applications such as e-healthcare and intelligent transportation systems (ITS). RL learns a good control policy, takes appropriate actions to achieve desired goals, and does not require prior knowledge of the system [1214]. RL approaches like deep Q-network (DQN) and Q-learning have been adopted widely in the SG system [1517]. Dang et al. [18] implemented the charging scheduling problem using the Q-learning mechanism. Chis et al. [19] presented an RL-based algorithm to schedule the individual charging of EVs at home that reduces energy costs of EV owners.

Some issues are identified in the existing approaches, for instance, no guarantee for optimality, financial losses, impracticality to implement, and scalability issue. Other issues like single-point-failure, reliability, trust, data security, QoS, and transparency also exist in the available approaches [2023]. Further, high deployment and usage of EVs for energy exchange raise various security and privacy concerns like unauthorized access to EVs data, denial-of-service, jamming, data injection attack, eavesdropping, and hijacking [24, 25]. These security attacks either harm the EVs or stop them from performing their committed task. This motivates us to develop and design a solution that copes up with the aforesaid security issues.

The abovementioned issues can be self-addressed with the adoption of blockchain technology in the EVs trading system [26], which is distributed in nature and integrated trust, immutability, and security [27]. The trust among two parties in the EVs trading system can be achieved using self-executable and self-verifiable smart contracts (SCs) [28]. Several blockchain-based V2G energy trading systems have a number of issues like limited data storage, EVs data storage cost, etc.

To handle the aforementioned issues, this paper proposed a Secure V2G Energy Trading (SV2G-ET) scheme using deep RL and Ethereum Blockchain Technology (EBT). The proposed SV2G-ET scheme benefits the EVs owner by maximizing their profits and securing energy trading data using blockchain (through SCs). Here, the proposed scheme uses EBT; the reasoning behind this is EBT was designed to be open, flexible, low-cost, and suited for multiple parties for trading cooperation in real time. EBT architecture has unique layers that strengthen energy trading systems and create new possibilities [29, 30]. SV2G-ET uses EBT for V2G trading to handle the interoperability among various entities like EVs owners, utility providers, and grid operators via wireless (like 5G-enabled tactile internet and 6G) or wired communication channel [31]. Table 1 shows a relative comparison of the proposed SV2G-ET scheme with existing approaches considering their merits and demerits.

1.1. Research Gaps

Following are the research gaps identified in the existing approaches:(i)Some issues are identified like no guarantee for optimality, financial losses, and impracticality to implement [20, 22, 23].(ii)Then, issues like single-point-failure, reliability, trust, data security, QoS, and transparency also exist in the available approaches [29, 30].(iii)Next, blockchain-based existing approaches suffer from limited data storage, data storage cost, and scalability issues [31].

To handle the aforementioned issues, this paper proposes SV2G-ET scheme based on deep RL and EBT. Here is the list of motivations for this research.

1.2. Motivation
(i)The importance of V2G energy trading is vital due to the high deployment and usage of EVs which raises various security concerns like unauthorized data access, denial-of-service, jamming, data injection attack, and hijacking [24, 25].(ii)Most of the existing approaches emphasize on centralized solutions for EVs energy trading that has a single-point-failure, security, and privacy issues.(iii)Security attacks either harm the EVs or stop them from performing their committed task. This motivates us to develop and design a solution that copes up with the aforesaid issues.(iv)Motivated from the above facts, this paper proposed a SV2G-ET scheme using deep RL and EBT. Here, deep RL is employed for charging/discharging schedule of EVs.(v)The reason behind employing EBT is that EBT was designed to be open, flexible, low-cost, and suited for multiple parties for trading cooperation in real time. Next, the deep RL works well on unseen data due to its model-free mechanism [29, 30].
1.3. Research Contributions

Following are the research contributions of this paper.(i)A model-free deep RL-based scheme, i.e., SV2G-ET is proposed for the charging/discharging schedule of EVs.(ii)Secure SCs are designed for decentralized V2G and also secure EVs data using the InterPlanetary File System (IPFS) mechanism.(iii)The proposed scheme facilitates the EVs owner to perform V2G/G2V with a nearby charging station to either buy/sell energy in a secure way.(iv)The effectiveness of the proposed scheme is evaluated based on various parameters like profit maximization of EVs owner and high system throughput by low data storage cost, data security, and scalability.

1.4. Organization of the Paper

The remaining sections of the paper are organized as follows. Section 2 narrates the system model of the proposed SV2G-ET scheme and includes problem formulation of it. Then, Section 3 specifies the workflow of the proposed SV2G-ET scheme. Section 4 presents the experimental results of SV2G-ET scheme and lastly, the paper is concluded in Section 5.

2. System Model and Problem Formulation

This section presents SV2G-ET’s system model and problem formulation of the proposed scheme.

2.1. System Model

Figure 1 shows the system model of the proposed SV2G-ET scheme, where the EV charging station’s operation is seen over the horizon of time . Here, arrival of EVs to an EV charging station is a random process.

In SV2G-ET scheme, the energy is getting generated at the grid and supplied to charging station via a utility supplier. Then, EVs are arriving at time slot, indicates the volume of EV, and energy is consumed by end-consumer, i.e., EVs owner through EVs. An EV gets charged (in timeslot ) at charging station based on the following parameters: . Here, represents the energy price at time , is arrival time, and is the charging demand. All these parameters are included in one vector , which is represented as follows:  =  .

In the proposed SV2G-ET scheme, at each time slot , the utility supplier supplied hourly energy prices to the charging station to perform energy trading between grid and EVs, i.e., V2G/G2V. The proposed scheme employed deep Q-network (DQN) based on the dynamic energy prices to perform V2G/G2V operation. Here, DQN uses the RL method, as well as Convolutional Neural Network (CNN) to approximate the Q-function, i.e., action-value function. The DQN agent determines the rates of energy and in return decides whether the action is G2V or V2G.

Once G2V/V2G operation is performed, then entire details of the energy trading are shared securely in real time among all entities. Every entity , i.e., utility supplier, grid administrator, and EVs are connected through EBT. The EBT is quite costly to store data, so the proposed scheme incorporates the IPFS mechanism to store data in off-chain mode. Then, only the hash-key of data is stored over EBT, which reduces energy data storage cost and improves system scalability. Moreover, IPFS stores EV’s owner details, energy records, etc. These EV’s ET data can be accessed using the hash-key over EBT. All the transactions are completed with full encryption and safety using unique SCs, designed over EBT.

2.2. Problem Formulation

In the proposed SV2G-ET scheme, the aim is to design the feature that will make a decision to perform V2G/G2V operation based upon the present energy price and the history of previous energy prices. Based on the EV’s ET obtained on an hourly basis, the amount of profit gets calculated. The problem is formulated by applying Markov Decision Process (MDP) in discrete time steps.

At every time step t (i.e., discrete in nature), the agent observes the energy price and availability of EV and then based on the agent action EVs either consume energy or distribute energy. Here, mathematical architecture is supported by MDP to perform decision-making (outcomes are partially controlled by a decision-maker and partly random). MDP comprises five-tuple . Here, represent system states, shows finite actions, is identified as probability for state transition, is the discount factor, and is reward.

The details of MDP formulation for the proposed scheme are shown as follows.(ii)Action: in the given state , the action represents the charging and discharging of energy. The action of charging or G2V is selected when the energy price is and action of discharging or V2G is selected when the energy price is . The charging/discharging constraints are defined as follows:(iii)Reward: the reward is calculated using current action and current state . The reward function at time step is as follows:The reward is represented as the energy price at which the action has occurred and multiplied by the action performed. At the time of departure, the reward is reduced by the price difference of the selling and buying action of the agent.(iv)Probability for state transition:Here, the above equation represents transition probability from state to with respect to the action . In the proposed scheme, there is no model hence the scheme needs to learn by sampling.(v)Action-value function: it predicts sum of future rewards (i.e., for time steps window), used to evaluate EV’s charging/discharging scheduling for \{action, state\} pair, i.e., \{, \}.Here, is the action-value function. is the EV charging/discharge policy that translates a system condition to a charging/discharge schedule and is the discount factor. Here, weighs the significance of present and future benefits [32]. It improves the profit of EVs’ owners through V2G/G2V operation.Once EV’s ET is done, then the complete details of the transaction are shared among all stakeholders using EBT. The EBT is incorporated with IPFS mechanism to improve data security, data storage cost , and scalability of the scheme. In view of the above discussion, SV2G-ET’s objective function is formed as follows.where denotes that the proposed SV2G-ET scheme ensures data security to be maintained over EBT. Then, denotes profit to , denotes scalability, and denotes the data storage cost over EBT. Here, limits of charging and discharging of EVs are system-dependent parameters.

3. The Proposed Approach

Initially, all stakeholders are registered with the EBT-based attack model through Trusted Registration Authority (TRA) to participate in the energy trading. The grid administrator generates a Unique Secrecy Code (USC) to all stakeholders such as buyer, seller, and utility supplier. Stakeholders can use USC to prove their identity and authenticity over authentication protocol of EBT using SCs. The TRA has all the information of USC of all stakeholders, which is required for registration process. The TRA verifies token for registration and provides the communication key for interaction over EBT using SCs. Figure 2 shows the complete registration procedure for buyer, seller, and utility providers along with SG administrators. In case of security attack, the attacker first requests the data access to EBT, as attacker is not registered with the system (as registration required USC) so data accessibility will be denied by the smart contracts (SCs). Here, defense layer is created by the SCs to handle the security issues in the proposed SV2G-ET scheme.

Figure 3 illustrates the workflow of the SV2G-ET scheme incorporated with deep RL (i.e., DQN) and EBT. Here, hourly energy prices are supplied to the DQN model and accordingly reward is getting calculated using action-value function and random action selection. In the proposed scheme, an action-value function is getting updated based on Bellman equation, which is as follows:From the above equation, we get expected return value of current state by calculating sum of reward at time and expected reward at time by taking action with discount factor . Moreover, after converging to the optimal value function, schedules are calculated using greedy approach, which is defined as follows:

DQN employs the Q-learning, which is one of the most frequently used RL methods, and also includes CNN in order to calculate the Q-function, i.e., action-value function. Here, DQN uses the method of experience replay and with probability random action a is getting selected, as shown inand then the loss is calculated by the loss function and update the episodes.where is the discount factor, i.e., decay rate, r shows the reward, are network parameters of episodes for Q-network, and it is used to compute the target. Here, we calculate loss by adding reward with our target reward and subtracting the predicted values.

Input: Energy price, vehicle volume, Reward
Output: Deep Q-Network parameters.
(1)DNN parameters is initialized randomly.
(2), a target parameter is initialized.
(3)for epsiode = 1 to do.
(4)Get .
(5)for timestep t = 1 to do.
(6)With probability select action randomly.
(7)Select action and reward is calculated to proceed to the next state, i.e., .
(8)State transition details (, , , ) are stored.
(9)Calculate loss function
(10)Update the parameter using Gradient Descent algorithm
(11)For each N, copy weights from to
(12)end for
(13)end for

The extraction of distinguish characteristics from raw data is a critical step in improving the action-value function approximation. The scheduling method can reduce charging costs by utilizing these characteristics. Here, CNN is used with one or more layers of units between the input and output layers to feed forward. The output units represent a hyperplane in the space of the input patterns. For number of layers, the weights are denoted as . Then, represents the bias, shows the output, and denotes the activation function. Then, the output of neural network is then fed into the Q-network for the estimation of approximate optimum action-value function.

In order to reduce the unwillingness of participation of the EV owner, the opinion of the owner should be known in prior. When the owner decides to not participate in the trading, the vehicle will not be able to gain or release energy; hence there will be no change in the charging state of the vehicle. At the time of peak load condition, the bidding strategy will depend on the number of different responses given by EVs owner. When the owner is unwilling to participate, those participating vehicles should not be included in the trading. As there will be maximum participation when there are maximum gains for EVs owners using the buy and sell price in the proposed scheme, such scenario has been considered in the proposed scheme. Further, to reduce the participation unwillingness in EVs energy trading (by the EVs owner), it is handled by providing incentives to the EVs owner from the grid, which is an extension of this research work.

Algorithm 1 shows the steps for DQN parameters and Algorithm 2 denotes the steps for reward calculation for V2G/G2V operation. Once V2G/G2V operation is performed, then ET details are published over EBT and stored in an off-chain storage system using IPFS. Further, private and public keys are generated and shared among all stakeholders for real-time secure access to ET data.

In the existing EBT-based approaches, the EVs ET details are stored into the EBT, which is costly operation, i.e., around USD 76,000 per 1 GB of storage. So, a secure, distributed, and immutable IPFS mechanism could handle this ET data storage issue using its Merkle direct-acyclic graph structure. This speeds up the ET data storage process by using low bandwidth and also handles data duplication, i.e., generating a unique hash value for each ET data. The integration of IPFS mechanism with EBT improves the scalability and transaction latency of the proposed scheme.

The proposed SV2G-ET scheme comprises IPFS-based EBT, which involves only storage cost of hash-key in place of complete EVs energy trading data storage cost. Then, IPFS is a distributed storage system (i.e., open-source) that is free and immutable. IPFS receives the energy trading data tuple, i.e., , from and generates the hash-key of this tuple. Here, represents the charging/discharging station of grid. IPFS splits the energy trading file into small chunks and encrypts it with a random encryption key and satisfies predefined conditions of SCs. Next, these encrypted chunks are stored over EBT. The hash-key size uses 256 bits for SHA-256 on EBT and a single tuple involves only 1-word size in SV2G-ET scheme.

Input: State , energy prices, vehicle volume
Output: EV G2V/V2G schedules
(1)Implement the DNN parameters trained by Algorithm 1.
(2)for Timestep t: 1 to do
(3)Obtain past 2 days energy prices.
(4)Network extracts the features from the data of energy prices.
(5)Calculation of action-value (, , ) by Q network
(6)The action is selected randomly, otherwise
(7)Output G2V/V2G schedule.
(8)Execute ET_Transaction_Details() for secure access
(9)Publish ET details over EBT and stored in off-chain mode IPFS.
(10)Generate private_key and public_key.
(11)Distribute private_key and public_key to all stakeholders i.e., EV_Owner, utility_supplier, SG administrator.
(12)end forreturn

4. Performance Evaluation

This section highlights the performance evaluation of the proposed SV2G-ET scheme with respect to the scalability, ET data storage cost, and profit to the EVs owner. The detailed explanation is as follows.

4.1. Dataset Description

The energy price dataset is referred to from Nord Pool [36]. In the proposed SV2G-ET scheme, the hourly prices are divided into training and testing part. The first 20 days of every month consists of training datasets and the remaining 10 days of the month contains test dataset. Furthermore, the EVs data is referred to from International Energy Agency (IEA) [37], which comprises energy capacity of total 30 EVs belonging to two different categories: battery electric vehicle (BEV) and plug-in hybrid electric vehicles (PHEVs). To evaluate the performance of the proposed SV2G-ET scheme, we integrate the energy price data and EVs data. The obtained results from this evaluation may vary due to different parameters such as block size, block creation time, endorsement policy, etc.

Figure 4 illustrates the opening (marked in green color) and closing (marked in blue color) energy prices dated from the month of January to September 2021. Then, Figure 5 shows the hourly price variation throughout the day, i.e., highest energy price (marked in blue color) and lowest energy price (marked in orange color).

4.2. Experimental Results

In the beginning, all simulation parameters are set. Table 2 shows these parameters. It is worth emphasizing that the proposed learning-based scheme makes no hypotheses about the distributions of random variables like arrival/departure time and energy consumption while creating the dataset. The battery capacity is considered to be 30kWh for each EV. The charging actions are for the states representing discharging and charging of the EV.

The discount factor is set to 0.9 for the agent to evaluate each of its actions. The price history for the last 24 hours is set in the representation network. The output is then fed into the Q-network for the calculation of the action-value function. Here, parameters are randomly updated using the gradient descent algorithm.

After the training for 10000 episodes, the optimal scheduling for V2G/G2V for EV charging is done. In each episode, the reward is calculated and learning starts after 1000 episodes. Figure 6 shows the cumulative rewards, which are being smooth over per 100 episodes. It is observed from graph that rewards start to increase as soon as agent starts to learn. Then, at episode 8000, the rewards gradually increase due to exploration and exploitation mechanism.

To investigate the performance of the proposed SV2G-ET scheme, ET occurs by EVs over a period of the days. Figure 7 illustrates the ET, where vertical axis shows the prices that are considered to be the mean price of the hour. Then, x-axis shows the time at which the ET takes place. Further, green and red circles show the discharge/V2G and charging/G2V, respectively. The graph depicts that the proposed scheme is able to detect high prices based on the previous action when it makes the V2G transaction. When the energy price is low, it performs G2V transactions in the scenario. The agent gets data in real time and acts on it using a smart pattern that allows EV to get discharge at peak (near about) prices and to get charge at low prices. The activity is mainly inactive when it comes to average pricing.

In the proposed SV2G-ET scheme, the accumulative charging cost of EVs throughout 10 days (test dataset) is calculated. Here, profit indicates the number of units of currency starting with 1.0 unit.

Figure 8 shows the result for ET, where profit is on y-axis and x-axis contains days. Then, the baseline-1 approach (Mhaisen et al. [33]) uses the heuristic strategy which represents the common strategy, where the EV gets charged fully and further its discharge until it reaches a certain threshold limit of the stored energy. In SV2G-ET scheme, the threshold limit is . Profit indicates the number of units of currency, which is starting with 1.0 unit. For baseline-2, the optimal case is taken and considered as the future energy prices are known for maximum profit. The profit generated by the proposed SV2G-ET scheme results in a promising outcome compared to baseline-1. More, the optimal method always generates profit because it knows the future prices, which is not possible in a real-life scenario.

Figure 9 depicts the comparison of SV2G-ET’s scalability with the existing approaches based on the transaction time of EVs trading and blocks mined. In the proposed scheme, ET data is stored in an off-chain system, i.e., IPFS, and merely the hash-key of ET is sent to EBT. Here, size of hash-key is 160 bits, which is lower than the original EV’s ET transaction size, which is in bytes. It facilitates adding more transactions to EBT. Therefore, SV2G-ET scheme offers more transactions and services to more EV owners at the same quantum of time, so improving the scalability of the system.

Figure 10 illustrates a comparative analysis of ET data storage cost in existing EBT-based approaches and the proposed SV2G-ET scheme, which uses IPFS mechanism for off-chain storage of data. In the existing approaches, the ET details are stored on the EBT itself, which is a costly operation. Another side of the coin is that the proposed SV2G-ET scheme uses a distributed and off-chain mechanism, i.e., IPFS, to store ET data that is a relatively low-cost storage system compared to the existing approaches.

EVs are limited by range and speed, and most of these EVs range from 50 to 100 miles and are required to be recharged again [38]. So, EVs cannot be used for long journeys with single charging, which results in that V2G energy trading is required to happen within this range only. RL agent makes charging/discharging decisions mostly as per the battery level of an EV (one of the major challenges due to mobility of vehicles). Therefore, RL agent can generate feasible solutions but it will be difficult to find optimal solutions, which is an interesting direction for future research. Further, to handle the participation unwillingness in EVs energy trading (by the EVs owner) also needs attention and that is handled by providing incentives to the EVs owner from the grid, which is an extension of this research. There are various other parameters that require attention and will be an extension of this study; these parameters are communication latency, communication cost, and EV owner’s privacy.

5. Conclusion

In this paper, EV charging/discharging is formulated using model-free RL methodology, which securely supports V2G/G2V incorporation with EBT. The randomness of energy prices and EVs commutes are considered for problem formulation to obtain an optimal mechanism for scheduling problems. This paper proposed a secure model-free scheme, i.e., SV2G-ET, which does not require environment uncertainty information. Here, a deep Q-network is employed to estimate the action-value function and securely provide accessibility on ET data among all stakeholders like SG, EV owners, and utility suppliers using EBT. Then, the performance of the proposed SV2G-ET scheme is evaluated by displaying the charging/discharging trend versus real-time pricing. The promising result shows the effectiveness of the SV2G-ET scheme, which is able to learn useful charging and discharging patterns of EVs during off-peak/peak hours. Further, it calculated the profit generated by the EVs owner over 10 days and showed a better result than the optimal scenario and heuristic approach.

In the future, the communication cost, EV owner’s privacy, and latency will be an extension of this research work while accessing the EVs’ ET data in real time using blockchain technology.

Data Availability

No data are associated with this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.