The hybrid energy storage systems are a practical tool to solve the issues in single energy storage systems in terms of specific power supply and high specific energy. These systems are especially applicable in electric and hybrid vehicles. Applying a dynamic and coherent strategy plays a key role in managing a hybrid energy storage system. The data obtained while driving and information collected from energy storage systems can be used to analyze the performance of the provided energy management method. Most existing energy management models follow predetermined rules that are unsuitable for vehicles moving in different modes and conditions. Therefore, it is so advantageous to provide an energy management system that can learn from the environment and the driving cycle and send the needed data to a control system for optimal management. In this research, the machine learning method and its application in increasing the efficiency of a hybrid energy storage management system are applied. In this regard, the energy management system is designed based on machine learning methods so that the system can learn to take the necessary actions in different situations directly and without the use of predicted select and run the predefined rules. The advantage of this method is accurate and effective control with high efficiency through direct interaction with the environment around the system. The numerical results show that the proposed machine learning method can achieve the least mean square error in all strategies.

1. Introduction

Fuel usage, energy source diversification, and electric propulsion technologies are some of the measures increasingly adopted around the world to make vehicles cleaner and more efficient with the ultimate purpose of reducing greenhouse gas emissions and reaching a sustainable energy ecosystem [13]. Hybrid electric vehicles are expected to have significantly lower fuel consumption than conventional vehicles as well as substantially lower emissions [46]. It is essential to equip hybrid electric vehicles with advanced energy management systems to achieve these goals. Electric Vehicles (EVs) and Hybrid Electric Vehicles (HEVs) can operate in a variety of modes such as fully electric or power distribution, which are controlled by an energy management system based on driving conditions [710]. Accordingly, an energy management system tries to improve the power output of multiple sources and supply the necessary power while minimizing the most important and significant costs [11, 12].

In recent years, researchers have proposed a number of models and methods for energy management in HEVs and also several reviews summarizing the progress of research in this area. Currently, the greatest challenge in improving the energy management systems of HEVs is how to shorten their computational processes and improve their adaptability. These challenges can be overcome with the help of learning methods [1316]. In particular, reinforcement learning can be used to achieve enhanced energy management in Hybrid Energy Storage Systems (HESSs) [17, 18]. Recently, researchers have shown a growing interest in energy management approaches that utilize novel methods like machine learning or artificial intelligence [1921]. It should be noted that machine learning is a data analyzing technique used to solve all kinds of problems in many industrial-based problems [22].

This paper presents a comprehensive approach to reinforcement learning-based energy management strategies. In general, the applications of reinforcement learning in energy management can be classified into two categories: (1) “simple algorithms,” which refers to using a single algorithm (e.g., Q-learning, dynamic learning, or SARSA) to produce energy management policies; (2) “hybrid algorithms,” which refers to using a combination of reinforcement learning algorithms (e.g., forecasting algorithms, deep learning algorithms, and predictive control models) or data [23].

The content of this article is organized as follows. First, the paper provides a review of HESSs and discusses the challenges of energy management in these systems and the optimization constraints in this area. Next, the paper reviews the variety of reinforcement learning methods that can be utilized in HESSs, including simple and hybrid algorithms. The article attempts to cover various methods and vehicle types and compare their main functional features.

1.1. Hybrid Energy Storage System

A Hybrid Energy Storage System (HESS) is an energy storage system comprised of two or more energy storage sources that meet the requirements of complex driving conditions. Considering their function, HESSs need to have a suitable energy management mechanism and topology to ensure well-coordinated power distribution among different energy sources. This also affects the lifespan of HESSs and the performance, efficiency, and cost-effectiveness of the broader system by determining how well power is distributed among different components. Energy management systems and topologies are the most popular topics of research into HESSs. Since different topologies have different power sources, there could be significant differences between different energy management systems. In recent years, many new studies have been conducted on the energy management system and topology of HESSs, but there have been few reviews of progress in this field.

The energy storage systems of EVs or HEVs have a wide range of features and functions. The performance of these systems is primarily indicated by rated power, charge/discharge rate, power density, energy density, self-discharge rate, response time, energy storage efficiency, and cycle life. With the help of these indicators, one can choose a suitable energy storage system based on performance requirements. Energy storage systems can be broadly divided into three categories of mechanical, electrical, and chemical, which are shown in Figure 1. The characteristic parameters of conventional energy storage systems are provided in Table 1.

Table 1 compares 6 energy storage systems. As can be seen, the highest power consumption is for air compressors, and the lowest is for lithium batteries. Also, the life cycle of each of these systems is presented, which can be an important criterion in choosing an energy storage system.

1.2. Energy Management Strategy

The main function of an energy management system is to balance the distribution of power between multiple energy sources and the power source so as to optimize a series of cost functions such as fuel consumption, battery life, emission, and driving control. This problem is usually formulated as a control optimization problem with specific control objectives and physical constraints. These control objectives may include one or more parameters such as exhaust temperature, emissions, fuel consumption, battery’s state of charge (SOC) and state of health (SOH), and power consumption costs. Figure 2 shows the energy management problem for a typical HEV.(1)Dynamics of power transmission systems: power transmission dynamic variables such as vehicle speed and acceleration, generator speed, power demand, battery SOC.(2)Model and other components: basic mathematical formulations for transmission components such as electric motor, battery pack, generator, and supercapacitor, and their connections to other components.(3)Control objectives: optimization objectives such as emission, fuel consumption, battery life, driving mobility, power costs, and shift frequency.(4)Physical constraints: constraints imposed on important variables such as battery SOC, power demand, rotation speed, torque, and gears.

The optimal control problem is often subject to three types of physical constraints: propulsion dynamics, initial and final values of state variables, and constraints that apply to control and state variables. Once the inputs like power demand, vehicle speed, SOC, current, steering angle, and speed are given, the amount of power needed from each energy source and the corresponding fuel cost can be calculated with the help of propulsion dynamics formulations. It is common to consider battery SOC, gearbox position, and engine/generator speed as state variables. Also, it is typical to treat engine output torque with throttle position, gear shift, and clutch position (in multimode HEVs like Toyota Prius and Chevrolet Volt) as control measures. To solve this optimal control problem, it is necessary to define appropriate constraints for these parameters. In addition, to control objectives and constraints, it is necessary to build a comprehensive model of power transmission system components as part of the solution process.

2. Literature Review

In reinforcement learning, an agent’s core element learns how to map inputs (states/modes) to outputs (optimal control measures) to maximize a cumulative reward. This reward maximization is usually done through a trial and error process, which involves observing how each action affects the reward at present and in the future (lagged effects). The basic steps of reinforcement learning include detecting the state of the environment, taking certain actions, and improving the actions with rewards serving as guidance [26].

Reinforcement learning has three characteristic features. The first feature is the balance and coordination between exploration and exploitation. The agent uses the exploration phase to gain knowledge about the environment and then proceed to exploitation, which means taking a control action based on the existing knowledge. The second feature of reinforcement learning is the ability to adapt measures without needing external control, which is essential in cases where the environment is vague or uncertain. Through its interactions with the environment, the agent can identify the state of the environment and take appropriate actions to affect it if needed. The third characteristic feature of reinforcement learning is that it is Markovian, meaning that the conditional probability distribution of future states of the environment depends only on the current state and not the sequence of events that precede it.

For the problem of energy management in HESSs, the environment can be modeled as propulsion dynamics and driving conditions. Also, the agent can be considered as a power distribution controller operating with a series of algorithms, the purpose of which is to search for a sequence of actions that maximize reward based on the available state and reward information.

To teach a reinforcement learning algorithm, it is necessary to define a value function for the agent. This value function is a function of state, action, and reward, which is typically denoted by Q(s, a) (where s is state and a is action). State, performance, and reward information can be collected in real driving conditions for the HESS energy management problem. Then, Markov decision processes (MDPs) can be used to mimic these variables, which means the next state and reward can be predicted based only on the current information and independent of historical data. Finally, the value function can be calculated to determine what will be the best control action. The difference between different reinforcement learning algorithms is in the criteria they use for updating the value function.

2.1. Application of Reinforcement Learning in HESS Energy Management

This section provides a summary of advanced reinforcement learning approaches used for the purpose of energy management in HESSs. The first part of the section is devoted to initial attempts to use simple algorithms in this field. Then, the section proceeds to review recent progress in the combined use of multiple algorithms and the development of hybrid algorithms for HESS energy management. Table 2 provides an overview of the variety of algorithms used in HESS energy management.

3. Methodology

3.1. Machine Learning

In machine learning, the subject is the design of machines that learn from the examples given to them and their own experiences. In fact, in this science, an attempt is made to design a machine using algorithms in such a way that it can learn and operate without explicitly planning and dictating each action. In machine learning, instead of programming everything, the data are given to a general algorithm, and it is this algorithm that builds its logic based on the data given to it. Machine learning has a variety of methods, including supervised, unsupervised, and reinforcement learning [2].

Machine learning is closely related to (and often overlaps with) computational statistics, the focus of which is computer prediction, and is linked to mathematical optimization, which also introduces methods, theories, and applications. Machine learning is sometimes integrated with data mining; the focus of this sub-category is on exploratory data analysis and is known as unsupervised learning. Machine learning can also be unsupervised and used to learn and recognize the basic form of the behavior of different creatures and then find significant abnormalities [3].

In data analysis, machine learning is a method for designing complex algorithms and models used for forecasting; in the industry, this is known as predictive analytics. Machine learning is beyond the realm of artificial intelligence. In the early days of artificial intelligence as a science discipline, some researchers made machines learn from data. They tried to solve this problem with various symbolic methods, like neural networks.

However, the growing emphasis on logical and knowledge-based methods has created a gap between artificial intelligence (AI) and machine learning. Probability systems were full of theoretical and practical problems with obtaining and displaying data. By 1980, cyber systems were gaining ground over AI, and statistics were no longer relevant. Work on knowledge-based learning continued within the realm of AI, leading to inductive logical programming, but the statistical trajectory of other research was beyond the realm of AI and was seen in pattern making and information retrieval. Research on neural networks was also rejected by AI and Computer Science (CS) at about the same time. This path was also pursued outside of AI/CS by researchers in other disciplines, including Hopfield, Rumelhart, and Hinton, under the name of connectionism. Their major success came in the mid-1980s with backpropagation [18].

Machine learning and data mining often use the same methods and overlap significantly, but while machine learning focuses on prediction based on known properties learned from training data, data mining focuses on discovering properties (formerly). The unknown focuses on the data (this is the step of analyzing knowledge extraction in the database). Data mining uses several machine learning methods but with different purposes; machine learning, on the other hand, uses data mining techniques as “unsupervised learning” or as a preprocessing step to improve learner accuracy. Much of the confusion between the two disciplines (often with distinct conferences and journals, with the exception of the ECML PKDD) stems from their underlying assumptions: In machine learning, performance is usually assessed by the ability to reproduce known knowledge, while In knowledge extraction and data mining (KDD), the key activity is to discover previously unknown knowledge. Compared to known knowledge, one unsupervised method (an uninformed method) easily fails other monitored methods, while in a typical KDD activity, supervised methods can be accessed due to lack of access to training data.

Classification machine learning models can be accurately estimated using techniques such as the holdout method, which divides data into a training set and an experimental set (usually two-thirds of the data in the training set and one-third). Moreover, it evaluates the model’s performance under training on the test set verified. In comparison, the N-fold cross-validation method randomly divides the data into k subsets, with k−1 of the data used for model training and k−1 for testing the model's predictive ability. In addition to holdout and cross-validation methods, bootstrap, which samples n items from the data set by pasting, can be used to evaluate the model accurately [18].

3.2. Learning Algorithms

In 2012, Hsu et al. [27] used the Q-learning algorithm to improve the power management system of electric or hybrid bikes. These researchers defined the power management objectives as improvement in riding safety and comfort and more efficient use of battery power. The simulation results of this study showed that the proposed power management system could offer 24% and 50% improvement in the riding quality and energy efficiency objectives, respectively. Since then, many researchers have started using reinforcement learning algorithms in HESS energy management rather than the control optimization theory. For example, Qi et al. used the Q-learning algorithm to optimize the battery SOC maintenance strategy of an HEV [28]. When combined with a sustainable strategy, this method can offer a balance between real-time performance and energy efficiency optimization. In [29], Liu used the reverse reinforcement learning method to create a probabilistic driving path prediction system, which predicts the suitable engine/battery power distribution ratio based on its forecast of driver behavior.

Over the past several years, Liu et al. have also conducted a number of studies on the use of reinforcement learning-based power distribution controls in hybrid transmission systems. First, they evaluated the adaptability, optimality, and learning capability of a Q-learning-based energy management strategy for a hybrid tracked vehicle [30]. Next, to develop real-time controls for a hybrid transmission system, they integrated an online recursive algorithm into the Q-learning structure so that control strategies can be updated in real-time [31]. However, these algorithms may not be robust against variability in driving conditions, i.e. different driving behaviors, driving areas, and road environments.

4. Proposed Hybrid Energy System

With the rapid development of deep learning and artificial intelligence in recent years, the energy management strategies of hybrid vehicles have become increasingly intelligent. It is now typical to embed two or more algorithms that process different types of information into a reinforcement learning framework in order to ensure more efficient, real-time controls based on speed and power requirement forecasts; information shared between vehicles or between vehicles and infrastructure, and interactions with smart grids and smart cities.

Deep reinforcement learning (DRL) has also proved to be an effective tool for designing an adaptive energy management strategy based on driving cycle data. In a study by Hu et al., they evaluated the performance of a DRL-based energy management strategy with online learning capability in comparison to a rule-based strategy [32]. The diagram of the DRL-based control strategy proposed in this study is illustrated in Figure 3. Moreover, the provided numerical results are illustrated in Table 3 and Figure 4.

According to Figure 3, there are two neural networks, a neural network for the nontransaction mode and a neural network for the transaction mode. Accordingly, Table 3 presents five different strategies for machine learning. In each strategy, the amount of data separation into two groups Train and Test (for each neural network) is specified. Also, three error indicators are presented after the implementation of the machine learning approach. The results in Table 3 and Figure 4 show that strategy 5 had the lowest error rate. This is because, in strategy 5, both neural networks are implemented in a similar way. Moreover, the highest error was reported for strategy 2, in which the data were evenly divided into two groups, train and test. Accordingly, it can be concluded that the proposed machine learning method can achieve the least mean square error in all strategies.

Real-time management under driving conditions must be achieved through the selection of the best adaptive strategies based on changes in the value function. In the case of hybrid algorithms, more data are needed for this purpose. In [33], researchers used a deep neural network (DNN) to train value functions offline and then used these functions in Q-learning for online control with adaptability to different transmission systems and driving conditions.

Xiong et al. developed a hybrid reinforcement learning-based real-time energy management system by combining the Q-learning algorithm with an online value function updating procedure. The product of this combination was a real-time control system, where control measures can be updated in real time. They then validated this strategy by running simulations on a hardware operating system with batteries and super capacitors added to the circuit [34].

5. Conclusion

In machine learning, it is a broad discipline that has designed learning algorithms that can guide stimuli, detect spoken language, and discover hidden settings in data volume growth. Financial data are no exception. It works with data streams that capture company characteristics, corporate governance characteristics, audit reports, market data, and environmental variables. Machine learning algorithms detect complex patterns in this data, select the best variables to explain the variable, and use the appropriate combination of variables to predict the sample accurately. They are the keys to opening up big, growing data sources that can make better predictions and make smarter decisions. Machine learning has received a great deal of attention in the social sciences and denies the existing approaches to data analysis.

Although deep learning-based energy management strategies are superior to rule-based strategies, two issues limit their use in practice. The first issue is the limited computation power of the vehicle’s CPU, which makes it necessary to install another computer on the vehicle to process the data. The second issue is collecting and storing the required information because deep learning requires substantial amounts of data to infer strategies for different driving conditions. However, with the development of network communication technologies and intelligent transportation systems (ITS), it is likely to become easier to use deep learning-based real-time energy management strategies in the future.

This paper provided a summary of reinforcement learning-based energy management strategies for Hybrid Energy Storage Systems (HESSs) of hybrid electric vehicles (HEVs). The paper started with an introduction to the problem of energy management in this field and how learning methods can be used to solve this problem. Then, the existing energy management schemes with multiple control objectives were discussed. In the end, the outlook for the further development of reinforcement learning-based energy management systems is as described.

A potentially rewarding line of research in this field is to work on the more efficient use of artificial intelligence techniques for energy management. It is also important to assess the theoretical and practical feasibility of the proposed methods through real-world testing and implementation as well as simulation. Considering the advancement of intelligent transportation systems, which also make it easier to gather traffic data, it might also be helpful to develop a method for adjusting strategies based on the behaviors of vehicles and infrastructure.

Data Availability

The data are available from the corresponding author on request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.