#### Abstract

The cell faults of lithium-ion batteries will lead to the atypical deterioration of battery performance and even thermal runaway. In this paper, a novel fault diagnosis method for lithium-ion batteries of electric vehicles based on real-time voltage is proposed. Firstly, the voltage distribution of battery cells is confirmed in electric vehicles, and the reasons are analyzed. Furthermore, kurtosis is utilized to discover cell faults for the first time. After the kurtosis-based strategy alarm, the faulty cells in the battery pack are identified through multidimensional scaling and density-based spatial clustering of applications with noise. This method reduces the computational load of the data platform due to the characteristics of the sequential structure. Finally, the strategies to quantify the level of faulty cells and evaluate the safety of electric vehicles are presented. Through the real-time data collected by electric vehicles, it has been proven that this method can warn and locate faulty cells earlier than the original system method and has better robustness than other unsupervised fault diagnosis methods.

#### 1. Introduction

To alleviate the energy crisis and deteriorating environmental pollution, lithium-ion batteries are widely used in electric vehicles (EVs) because of their long cycle life, cleanliness, high energy density, and high-power density [1, 2]. EVs will be the development trend of future automobiles and the focus of competition in the global automobile industry. Nowadays, China regards the EV industry as one of the key strategic emerging industries. As the core component of an EV, lithium-ion batteries are assembled from many cells in series or parallel to provide drivers with sufficient range and power performance [3]. Therefore, the state of the battery directly determines the overall performance of an EV. Potential battery cell failure will lead to the decline of the comprehensive performance of EVs and even battery short circuits. In extreme cases, it will lead to safety accidents such as thermal runaways [4, 5]. In recent years, EV safety accidents occur frequently, which hinders the development of the EV industry [6]. Lithium-ion battery cell voltage is the most intuitive and effective dynamic information in the operation of EVs. Early and precise detection of voltage faults is helpful to take measures to avoid safety accidents [7]. Early warning and isolation of battery failure units based on real-time battery parameters are of great importance to improve the safety of EVs.

To enhance the reliability and safety of lithium-ion batteries, many scholars have proposed different methods for lithium-ion battery fault diagnosis. Current fault diagnosis methods can be divided into three categories: experience-based methods, model-based methods, and data-driven methods [5, 8, 9].

The experience-based method is based on the existing prior knowledge, using logical analysis and reasoning the relationship between events to achieve battery fault diagnosis. It can be divided into the expert system [10], fuzzy logic [11], and graph theory [12]. Experience-based methods have no learning ability, resulting in limited generalization ability, which makes this kind of method less applied in lithium-ion battery fault diagnosis. The model-based method establishes the mathematical or chemical model of lithium-ion batteries. Residual signals are obtained to detect and identify faults by comparing measurable signals from the model outputs [13]. Xiong et al. [14] estimated the state of charge (SOC) based on unscented Kalman filter (UKF) and recursive least squares (RLS) to locate voltage sensor fault location by the residuals. Wei et al. [15] presented a second-order equivalent circuit model (ECM) and strong extended tracking Kalman filter (ST-EKF) to estimate terminal voltage online for battery fault diagnosis. Song et al. [16] designed a fractional-order Luenberger observer to accomplish the task of fault detection. Dey et al. [17] utilized a partial differential equation (PDE) model-based diagnosis scheme for thermal faults in lithium-ion batteries. Kong et al. [18] developed an electrochemical model for lithium-ion batteries to detect early internal short circuit cells. The advantage of the model-based method is that the model mechanism is clear and easy to modify. However, the model-based method is difficult to be applied in practice because of many parameters or complex equation calculations, and sensor errors with many parameters can easily affect their performance.

With the development of machine learning and artificial intelligence, as well as the advancement of computer software and hardware. In recent years, data-driven methods have received a lot of attention in battery fault diagnosis. The data-driven method can be regarded as a “black box,” which overcomes the problems of complicated modeling and parameter identifications [8, 19]. Jiang et al. [11] established the early warning model for charging safety of EVs by detecting the cell voltage based on the back propagation neural network (BPNN). Yao et al. [20] decomposed and reconstructed the voltage data through the reconstructing discrete wavelet transform (RDWT) and then inputs it into the general regression neural networks (GRNN) to divide the fault into four levels. Li et al. [21] located potential faulty cells by combining ECM and long short-term memory (LSTM) neural networks. Jiang et al. [19] proposed a support vector machine (SVM) for the study of temperature rise caused by short circuit failure of batteries. Yang et al. [22] applied random forests (RF) to the research of external short circuit fault diagnosis. The above methods require a large number of training sets to have good results, resulting in excessive memory consumption and unsuitable for new vehicles. It is used in lithium-ion battery fault diagnosis and detection by converting basic information into characteristic information, such as entropy theories. Duan et al. [23] designed a method for evaluating the inconsistency of cells based on Shannon entropy. Liu et al. [24] evaluated each cell based on Shannon entropy to determine the potential faulty cells. The Shannon entropy theory has also been applied in the detection of battery pack connection faults [23]. The improved Shannon entropy [7], sample entropy [25], and multiscale entropy [6] have also been utilized for battery fault diagnosis. Some improved entropy theories have an excessive number of iterations, which increases the running time of the method. The method based on statistical knowledge and other methods has great prospects in practical fault diagnosis. Xia et al. [26] proposed a method to detect internal short circuit fault by calculating the correlation coefficient between cell voltages. Zhao et al. [4] presented a 3 multilevel screening strategy and neural network to screen abnormal cells and used a clustering algorithm to verify the effectiveness of its screening strategy. Sun et al. [27] used a correlation coefficient and -means clustering method to diagnose faulty cells. Wang et al. [28] established a data-driven abnormal cell capacity diagnosis model based on statistics and a tree-based model. She et al. [29] developed a battery state of health SOH estimation model synthesizing from battery state of health based on incremental capacity analysis (ICA) and considering the inconsistency of cells. Chen et al. [30] developed the method of local outlier factor (LOF) to detect the voltage fault of the battery cells. Some of the above studies were carried out in the laboratory. Zhang et al. [31] optimized the multiobjective design of the hybrid energy storage system for EVs to extend the battery life and reduce the failure rate. However, in practice, the battery faces a more complex environment due to the impact of the environment and the different driving modes of drivers.

In this paper, the data obtained are not for the charging and discharging of lithium-ion batteries in the laboratory. All of the data collected comes from the real world to ensure a more practical fault diagnosis model. Based on mileage, the cell voltage distribution during charging and discharging of EV is analyzed. Firstly, kurtosis is used as an early warning indicator for faulty cells for the first time. Secondly, the method combines multidimensional scaling (MDS) with density-based spatial clustering of applications with noise (DBSCAN) to diagnose faulty cells. The subsequent calculation is performed only after the alarm of the kurtosis-based detection method, which reduces the computing load of the platform. Furthermore, the fault type of the abnormal cell is determined and quantified by using statistical knowledge, and the voltage consistency of the whole vehicle is quantified. Finally, the validity of the method is verified by the real-time data collected from EVs and compared with other unsupervised methods.

The rest of the paper is organized as follows. Section 2 describes the data acquisition and analysis. Section 3 introduces the methodology for voltage fault diagnosis. Section 4 discusses the diagnostic results in detail. Section 5 concludes this paper.

#### 2. Data Acquisition and Analysis

##### 2.1. The Data Source of EVs

The EV has the characteristics of informatization and networking. Therefore, China has established data centers based on big data technology. For the data acquisition specification of EVs, China published the technical specifications of remote service and management systems for EVs in 2016. The format, range, and frequency of collected data are also stipulated in this specification. In current, the EV big data centers have formed three levels of structure: Enterprise Service and Monitoring Center (ESMC), Local Service and Management Center (LSMC), and National Service and Management Center (NSMC). The structure of the EV data center is shown in Figure 1. During the operation of an EV, the vehicle terminal transmits the data collected by the sensor to the data center through the general packet radio service (GPRS) wireless network according to the specification. All data is collected by the ESMC for the first time and reported to other centers following the communication protocol.

The data platform architecture is mainly based on the Linux system and Java programming language and is built with the Hadoop system. By April 2022, the NSMC has connected 8 million new energy vehicles. Simple data statistics cannot meet the current data analysis needs of the new energy automotive industry. The data mining and law exploration behind the data deserve more attention, such as EVs safety warnings and vehicle technology subsection analysis. In order to solve the above problems, it is necessary to establish relevant models with the powerful power of big data platform or targeted cloud platform in close combination with the actual operation of new energy vehicles, so as to achieve the effect of analyzing the operation of new energy vehicles and early warning [32].

##### 2.2. Data Content and Cleaning

To ensure the acquisition of real and sufficient data, this paper collects the data of 10 electric buses running for one year from the ESMC of an electric vehicle enterprise in Suzhou, China. The main content of the collection is that each system during vehicle driving includes the battery system, motor drive system, and vehicle control system as shown in Table 1 [24].

In this paper, the cell voltages of EVs are extracted for analysis. The voltage acquisition accuracy is 0.01 V, and the format obtained from the ESMC is a matrix. The row vector of the matrix is the voltage data of all cells at a certain moment, and each column vector is the voltage data of each cell. The acquisition frequency of each collected data is 0.1 Hz.

Due to the influence of buildings and the environment during driving, the data may be lost in the steps of data acquisition, transmission, and decoding, resulting in null values. Sensor failure also causes the collected data to be obviously abnormal or null, so it is necessary to clean the data.

The data cleaning strategies of this paper are as follows: The extreme abnormal data is removed through the box plots, such as 65,535 V caused by signal loss. Box plots are a common method for handling outliers in data. The feature of box plots is that there are no strict requirements on the distribution of data processed during the use of box plots. The box plot eliminates outliers as shown in Figure 2.

In Figure 2, the and are the upper and lower quartiles of the data, respectively. The interquartile range (IQR) is the length of box plot, which is the blue square in Figure 2.

The calculation formula of each symbol is as follows:

To prevent the fault values from being eliminated, the variable is used to expand the and the . The interval between the normal values is . The range of outliers is.

The missing data with more than 3 consecutive rows missing is discarded, and the remaining missing data is supplemented by the previous value.

##### 2.3. Voltage Distribution Analysis

For some battery cell fault researchers, the cell voltage distribution is regarded as a normal distribution in some literature, and the -score [33] or 3 [4] method is proposed to diagnose voltage fault. However, the conclusion that the cell voltages conform to the normal distribution is not verified. A series capacity degradation model is established in the laboratory environment. Through the accelerated aging test, it can be concluded that the degradation of batteries is independent of each other and conforms to the normal distribution [34]. It is difficult to approximate the normal distribution of battery cells in the real world because of the complicated operating conditions of EVs. The reasons are as follows, and some reasons are verified in 4. (1)The number of battery cells varies from dozens [4] to thousands [8] of different vehicles. So the number of cells is insufficient to support the normal distribution theory for some EVs(2)Some EVs have battery packs installed in different locations of the EV. The battery packs are installed on the side and back of the EV in this study. Due to the different location temperatures and other conditions of the EVs, it will affect all the cells in the whole battery pack. Even the performance of all the cells in the battery packs is inferior to other battery packs(3)The cells are connected in series and parallel. In the process of charge and discharge, the cells affect each other, resulting in each cell not being completely independent. In addition, the different control strategies of the balancing system in the battery management system (BMS) also influence the parameters of the battery cells

The normal distribution has some simple and efficient methods for screening outliers, but forcing data to be considered as normal distribution will degrade the accuracy of the model. Therefore, battery cell distribution should be tested and analyzed when developing faulty cell diagnosis methods.

#### 3. Methodology

##### 3.1. Kurtosis

Kurtosis can be defined as the standardized fourth population moment about the mean, which is a dimensionless parameter [35]. It was originally used to represent impulsive characteristics in signal processing and quantify the waveform peaks. Kurtosis has been widely used in fault diagnosis [36], signal processing [37], and other fields. The formula of kurtosis is shown in formula (2). The is the standard deviation (SD) as shown in formula (3). where is the kurtosis, is the expected value, is the sample mean, and is the fourth moment about the mean.

##### 3.2. Multidimensional Scaling

Multidimensional scaling (MDS) is a multivariate data analysis method that can show “distance” in low-dimensional space. The principle is that when determining the similarity of objects, MDS can maximize the similarity of objects in two-dimensional or three-dimensional space [38]. After optimization by Kruskal [39] et al., it has been widely used in fields such as medical analysis [40] and environmental research [41]. The specific calculation steps are as follows:

There are objects in the set, and the similarities between each object are calculated with the correlation coefficient in the statistics. The is the similarity between and . So the generalized distance matrix is constructed, and the nondiagonal elements of the matrix are arranged from small to large as follows:

To indicate ranking, the is the similarity of and calculated by the correlation coefficient, and is the similarity arranged in the -th.

Euclidean distance matrix is defined in *r*-dimensional space:

The is the Euclidean distance between elements and in -dimensional space.

The matrix satisfies the following conditions:

The is the between elements and , and *l* is the distance arranged in the -th.

To find a suitable point to make and as similar as possible, MDS changes the position of objects in *r*-dimensional space by iteration. To measure this similarity, Kruskal’s normalized stress-1 criterion is defined as follows [42]:

The evaluation standard for stress-1 is shown in Table 2 [39].

In order to reduce dimensions, the center matrix based on the distances of objects is constructed as follows:

The eigenvalues and eigenvectors of the *B* are calculated and arranged as follows:
where the is the -th eigenvalue of matrix and the is the eigenvalue corresponding to the eigenvalue of matrix . The is the dimension of the MDS output.

The coordinates of each object in the low dimension space could be expressed as follows:

##### 3.3. Density-Based Spatial Clustering of Applications with Noise

The density-based spatial clustering of applications with noise (DBSCAN) algorithm is a density-based spatial clustering method proposed by Ester et al. [43]. It is one of the most representative density-based clustering methods. The algorithm defines the following concepts: Define as radius parameter and *Minpts* as neighborhood density threshold. Within a set, is considered a core point if it contains objects within the radius of that exceeds the *Minpts*. If the is within a radius of , the and are called direct density reachable. For the objects , , … , … ,, if all and are direct density reachable, the and are called density reachable. For *w* in the set, if both and are density reachable with *w*, the and are called density connection.

The process of the DBSCAN is as follows: All objects in the set are traversed through direct density reachable, density reachable, and density connection. When a cluster is clustered, the next point is selected to start the cluster until no new point is added to any one cluster. Finally, the above objects are clustered into several clusters, and the objects not included in any cluster are noise data [44].

##### 3.4. Diagnosis Method

Firstly, the real-time operation data of EV is obtained by the ESMC. Through data cleaning, the obvious abnormal values of cell voltages are removed and the missing values are filled. The cell voltage matrix with interval is collected.

The matrix is as follows: where is the sampling time and is the number of cells. The is the voltage of the -th cell in the -th sampling time.

The sliding window without repeating sample is used for the voltage matrix . The kurtosis of the matrix at each time is calculated. If the kurtosis exceeds the set threshold for 3 consecutive moments (), it is determined as the battery cell fault. The above conditions will trigger the faulty cell alarm. The matrix is shown below: where is the kurtosis of the sampling at the time .

If the voltage alarm is triggered, the distance matrix is obtained by calculating the Euclidean distance of all cell voltage curves in the window. The Euclidean distance of cell voltage curves is calculated as follows: where is the similarity between the -th cell voltage curve and the -th cell voltage curve. The is the -th cell voltage value at the -th moment.

The similarity between cell voltage curves is expressed by matrix :

If all cell voltage changes consistently during the operation of EV, matrix should approach matrix 0.

The distance matrix is mapped to two-dimensional space based on MDS. Normalize the data of the two dimensions, respectively, so that all data are between the interval [0,1]. DBSCAN is used to determine noise points in two-dimensional space, and the corresponding index numbers of noise points are faulty cell numbers.

When the faulty cell numbers are determined, it is judged that the fault cell is overvoltage or undervoltage by formula (15). The formula of is shown in (16) where the is the voltage of the -th cell in the -th sampling time. The is the average voltage of the cell at the -th time.

The BIAS greater than 0 indicates cell overvoltage and less than 0 is cell undervoltage.

Define the concept of vehicle inconsistency score to quantify the safety of each vehicle as follows:
where the is the total sampling time in a window and is the kurtosis of time. The higher the *-*score of inconsistency, the worse the consistency of the battery cells in the EV.

The characteristic of this diagnostic method is that the following calculation is performed to locate the faulty unit after the prealarm based on the kurtosis. This method reduces the computational load of the data centers and has practical application capabilities. The flowchart of the method is illustrated in Figure 3.

#### 4. Results and Discussion

##### 4.1. Distribution Results

To test whether the cell voltages of EVs in this paper conform to the normal distribution, cell voltages of five vehicles are selected for analysis with the change of mileage. The cell voltage distribution of EVs at 0 km, 10, 000 km, 20, 000 km, 30, 000 km, and 40, 000 km is extracted. Restrictions were added to ensure that it is not disturbed by other factors: (1)The charging and discharging conditions of EVs are distinguished, and the voltage distribution during charging and discharging is recorded for each vehicle at different mileage(2)For discharge case, the total discharge current is between 23.5 A and 26.5 A, and SOC is between 78% and 82%. For charging case, SOC is between 78% and 82%, and charging current is platform period with an error no more than 5 A

Based on the above restrictions, 50 samples of cell voltage distribution were obtained. The distribution of the five vehicles is similar, taking vehicle 1 as an example. The voltage distribution and Q-Q diagram of vehicle 1 during discharge are shown in Figure 4 and during charging are shown in Figure 5. The is the discharge state of EVs in Figure 4 and the is the charge state of EVs in Figure 5.

The voltage distribution is similar to the long tail distribution when the battery is discharged, and the Q-Q plots are similar to the inclined “.” The reason for the long tail distribution is that there are double peaks, indicating that more data deviate from the normal distribution, which tends to weaken with the increase of mileage. But the voltage distribution approximates the left-skewed distribution when the battery is charged, and the Q-Q plots are similar to the inverted “.” The reason for the left-skewed distribution is that more data are centered on the left. The cell voltages are more concentrated when charging because of the balance system of the BMS.

Based on the Shapiro-Wilk test, none of the above 50 samples could be accepted as normally distributed at the 0.05 level, which proves that the battery voltage of EVs does not conform to the normal distribution regardless of the mileage change or whether it is charged or discharged. To increase the test sample size, 2,000 driving segments (whether charging or discharging) were randomly selected from multiple vehicles for the Shapiro-Wilk test, and the results did not conform to the normal distribution.

For the vehicles in this study, 64 temperature probes were installed in 12 battery packs. Based on the one-year driving data of a vehicle, the number of distribution of the maximum temperature probe and the minimum temperature probe are shown in Figure 6(a). The range is the difference between the maximum and minimum probe temperatures, and the range of the temperature of the probe is shown in Figure 6(b). This means that the cells in different locations are in different temperatures, and the temperature range of the probe is greater than 3°C many times, which will affect the performance of some battery packs. The difference in battery packs leads to the distribution of cell voltage like a square wave at a moment as shown in Figure 7.

**(a)**

**(b)**

##### 4.2. Cell Fault Early Alarm

The Figure 8 shows the battery voltage, battery current, and mileage of an EV, while it is driving. When the driver accelerates the vehicle, the current of the battery increases, and the voltage decreases slightly. When the driver brakes the vehicle, the current is negative because of the braking energy feedback.

The vast majority of cell voltages is normal during the running time of an EV. Only a few cells are abnormal, and the voltage differences with other normal cells are subtle, especially at the beginning of the cell fault. SD is commonly used in many EVs as an early warning indicator for detecting cell faults [6]. However, traditional detection indicators such as SD cannot detect cell faults and alarms at the beginning of failure. Kurtosis can sensitively capture peaks in voltage curves.

The following figures compare the warning capabilities of SD and kurtosis for cell fault. The cell voltage distribution of one hour before the cell fault alarm on that day is shown in Figure 9(a), and the voltage distribution curve of a vehicle during normal operation is shown in Figure 9(b). The SD of the normal moment and the moment before the fault changes slightly. But the kurtosis before the cell fault is much greater than that at the normal moment, which can effectively identify the fault and avoid the problem of false alarms caused by setting a small threshold.

**(a)**

**(b)**

The operation data of 10 EVs is selected. The cell voltages are extracted based on the sliding window, and the window size is 100. The kurtosis for each window of 10 vehicles is calculated. In the first window, the kurtosis of vehicle 5 is significantly higher than that of other vehicles as shown in Figure 10. However, at this time, vehicle 5 has no faulty cell alarm, and the cell fault alarm is not triggered until 3 hours later as shown in Figure 11. But the fault is detected by the kurtosis at the first window. The above research proves that kurtosis can be used as an indicator of the faulty cell early warning.

##### 4.3. Fault Cell Identification

The similarity matrix between the voltage curves of 336 cells of vehicle 5 is calculated. MDS is used for distance matrices, and the results are mapped in two-dimensional space. The stress-1 is 0.00171 indicating that DMS fits well. The DBSCAN in two-dimensional space shows that cells 216 and 217 are distinct outliers. The result of DBSCAN is illustrated in Figure 12. Observing the voltage curves, it is found that the voltage curves of cells 216 and 217 are offset. The voltage curves of 336 cells corresponding to the first window are shown in Figure 13.

Although the actual offset is small, this difference can be magnified based on MDS and DBSCAN for cell fault diagnosis.

Through a large number of sampling windows of the 10 vehicles, the kurtosis alarm threshold of 60 is set by a trial and error method. The is 0.3 and *Minpts* is 5 in DBSCAN.

The data of the day before vehicle 5 triggers the alarm is selected. According to the fault diagnosis method proposed in this paper, the kurtosis exceeds the set value in sampling from 2,900 to 3,000 samples as shown in Figure 14.

With the analysis of the cell voltage curves, the voltage curves of cells 216 and 217 begin to deviate gradually as shown in Figure 15. The higher cell voltage in the first few minutes is due to the charging of the vehicle. The early warning of this method is one day earlier than the actual alarm and can identify abnormal cells as shown in Figure 16.

Prediagnosis of battery cells based on kurtosis is necessary. Outlier cells were not found to use the MDS and DBSCAN for normal vehicles. The DBSCAN results of normal vehicles are shown in Figure 17(a), 17(b). More importantly, the MDS and DBSCAN for each window will increase the data center computing load.

**(a)**

**(b)**

##### 4.4. Fault Quantification

Based on the BIAS, cells 216 and 217 are quantified to determine overvoltage or undervoltage, respectively. The cell consistency of the vehicle deteriorates from the day before the alarm. The decrease in cell 216 is almost equal to the increase in cell 217 as shown in Table 3. The reason is that cells 216 and 217 are connected in parallel, and changes in the internal resistance or capacity of one cell affect the other.

According to the formula (17), the *-*score of the 10 vehicles is shown in Figure 18. The higher the *-*score, the worse the consistency of the vehicle. The cell voltage consistency of nine vehicles is fair, while the inconsistency *-*score of vehicle 5 is significantly higher than other vehicles.

##### 4.5. Method Comparison

This method is compared with common unsupervised algorithms such as 3 [4] and entropy-based [6]. In this paper, the ability of multiscale sample entropy is tested in diagnosing abnormal cells, and the multiscale entropy of 336 cells is calculated. Generally, the template of length points is 2. The range of tolerance is generally between and . In this paper, the by trial and error method and the scale =15. In practice, it is found that more data is needed for multiscale sample entropy; otherwise, matching vectors will not be found in the calculation process, which will lead to the absence of a definition of sample entropy. Each cell of vehicle 5 is calculated with multiscale sample entropy as shown in Figure 19(a). Figure 19(b) is the multiscale sample entropy of a normal vehicle. The sample entropies of each individual are very small, so it is difficult to identify the fault cells.

**(a)**

**(b)**

The 3 method is used for cell fault voltage diagnosis, which can accurately identify faults in advance as shown in Figure 20(a). But some normal cells are diagnosed as faulty cells as shown in Figure 20(b). Because the cell voltage data are not normally distributed, this conclusion was validated before the method was developed.

**(a)**

**(b)**

#### 5. Conclusions

In this paper, the novel method for lithium-ion battery fault diagnosis of EV based on real-time voltage is presented. The effectiveness of the method is verified based on the real-time data collected by EVs. The related conclusions are drawn as follows:
(1)The cell voltage distribution of some types of EVs is verified not to be a normal distribution in this work. The reason is the limitation of the number of cells and the interaction of the cells in series and parallel. Other reasons include the influence of the balance system of BMS, and the location of the battery pack is different(2)Kurtosis is used as an early warning indicator to find faulty cells for the first time. It is verified that kurtosis has a better ability to detect cell faults than SD. The fault cells are identified through MDS and DBSCAN. The faulty cells were detected one day earlier than the previous method. Furthermore, the BIAS judges the type of cell fault and quantifies the degree of fault, and the *-*score evaluates the cell inconsistency of EVs(3)Comparisons of the entropy-based method and 3 method verify the superiority of this method. It is difficult to find faulty cells in advance based the on entropy-based method. The 3 method can accurately identify the cell faults in advance, but some normal cells are diagnosed as faulty cells

Future work can be carried out in the following areas: (1)The existing data preprocessing methods do not combine the operation principle of EVs. In the future, some battery parameters will be combined with machine learning to propose a more efficient and accurate filling strategy(2)In this paper, this battery fault diagnosis model was established based on the cell voltage. A fault diagnosis model based on multivariate information fusion will be established combining temperature, voltage, and capacity data of the cells in the future(3)The charge and discharge states of EVs should be discussed separately in subsequent studies, which can better distinguish the differences between the static and dynamic processes of EVs

#### Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

#### Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.