#### Abstract

Nonintrusive industrial load identification can accurately acquire the operation data of each load in the plant, which is the benefit of intelligent power management. The identification method of the industrial load is complicated and difficult to be realized due to the difficulty in collecting transient data for modeling, and high-precision measuring equipment is required. Aiming at this situation, the article proposes a nonintrusive industrial load identification method using a random forest algorithm and steady-state waveform. Firstly, by monitoring the change of the industrial load power state, when the load changes and becomes stable, the steady-state waveform is extracted. Due to different electrical characteristics of industrial loads, the current waveform of loads is different to some extent. We can construct characteristic data for each industrial load to construct its own current steady-state waveform. Then, using the high-dimensional data of the steady-state waveform as the sample data, the bootstrap sampling method and the CART algorithm in the random forest algorithm are used to generate multiple decision trees. Finally, the industrial load types are identified by voting multiple decision trees. The actual operating load data of a factory are used as the sample data in the simulation, and the effectiveness and rapidity of the proposed identification algorithm are verified by the combined load method simulation comparison. The simulation results show that the accuracy of the proposed identification algorithm is more than 99%, the identification time is 3.36 s, which is much higher than that of other methods, and the operation time is less than that of other methods. Therefore, the proposed identification algorithm can effectively realize the nonintrusive industrial load identification.

#### 1. Introduction

With the continuous development of smart grids, guided by big data and the concept of a conservation-oriented society, the collection of various load power data and power optimization management in the power industry have become increasingly important. The research of nonintrusive load identification methods is very necessary [1]. Traditional load monitoring is generally installed on the equipment side, and the equipment operation status can be collected without load identification, but the cost is high, and the installation is inconvenient [2]. At the same time, industrial loads have continuous production and high safety requirements [3]. Therefore, the advantages of nonintrusive industrial load identification with low cost, convenient installation, and easy promotion play an important role. It can not only provide an important basis for the fine management of the power grid but also provide a reference for users to use electricity reasonably [4, 5]. Nonintrusive industrial load identification can improve the overall reliability and profitability of factories and related industrial systems [6].

At present, the focus and difficulty of nonintrusive load monitoring are load identification, among which there are many related research studies on household load identification [7, 8]. Nonintrusive load identification methods focus on the clustering algorithm [9, 10], V-I trajectory [11], artificial intelligence algorithm [12, 13], etc. In terms of the selection of electrical characteristics, the study is divided into transient [14–16] and steady state [17]. Xu et al. [18] established an applied deep confidence network method, which used multiple restricted Boltzmann machines and a layer backward propagation neural network to carry out load identification. Wu et al. [19] proposed monitoring at the metering end of the total low-voltage electricity consumption and studied a waveform analytic identification method at the edge of the event. Gillis et al. [20] presented a new concept based on wavelet design and machine learning applied to nonintrusive load monitoring that can improve the prediction accuracy. Fang et al. [21] proposed a novel NILM method that leverages advances in statistical learning achieving superior performance. Zhou et al. [22] proposed a nonintrusive load decomposition method based on a hybrid deep learning model that improves the performance of the whole network system.

However, most of the current algorithms have complex extraction process, slow recognition speed, and limited application scope. And there are relatively few studies on industrial load identification. Since it is not convenient to collect transient characteristics for modeling of industrial load due to production demand, how to extract data for modeling identification without affecting the operation of industrial load becomes very critical [23, 24]. In this paper, the operation of industrial load is studied. The proposed method requires few electrical characteristic data, has a simple training dataset, and is more convenient for sampling, which improves the accuracy of identification and operation efficiency. The contributions are as follows:(1)Integrated with the steady-state current waveform of the industrial load, here, the presented feature extraction method described the characteristics of the industrial load(2)A nonintrusive industrial load identification method is propped based on the random forest algorithm and voting method(3)Based on the actual factory’s operating load data from Henghe DL850 acquisition equipment, plenty of simulations are carried out to verify the effectiveness and rapidity of the proposed method(4)The flexibility, practicality, and scalability of the proposed identification algorithm are further analyzed and discussed according to the simulations and comparisons

#### 2. Principle of Nonintrusive Industrial Load Identification

The principle of nonintrusive load identification is shown in Figure 1, including five links of data collection, data processing, state change, feature selection, and load identification [25].

##### 2.1. Data Collection and Processing

For the collection of industrial load characteristics, due to the high voltage of the industrial load, electrical data are mainly obtained by using smart meters or professional collection equipment through the low-voltage side. The collected data include active power, reactive power, voltage, and current. The collected data need to be preprocessed to meet the requirements of the algorithm program parameter input.

##### 2.2. State Change

Before judging the load type, it is necessary to judge in advance whether there is a change in the state of the load starting and stopping or the same load multistate change. During the load status change, the load power changes more clearly. Generally, by observing the changed power value within a specified time and comparing it with the threshold value, it is further judged whether there is a load switching event.

##### 2.3. Feature Selection

This paper uses the load power as the load feature extraction for comparison when the state changes and uses the steady-state current data as the load feature for calculation during the load identification process.

##### 2.4. Load Identification

After judging the load status change, this article collects steady-state current waveform data, uses the random forest algorithm to identify the load, and judges the load type and load state.

#### 3. Industrial Load Feature Extraction

In this paper, the industrial load feature extraction is based on the monitoring of the industrial load power state changes through events, and then the steady-state current waveform is extracted.

##### 3.1. Event-Based Monitoring of Industrial Load Power

During the normal operation of industrial loads, it is generally not possible to repeat the start and stop operations. It is more convenient to collect data when the load is working stably. The voltage and current waveforms of the load are generally sine waves, so when the load status changes, it is mainly reflected in the changes in power and current. Usually, the change point detection algorithm is used for the active data to capture the moment when the power load status changes, and then the relevant event characteristics are extracted.

Assume that, during the period of *t*_{S} (*t*_{S} is an integer multiple of the sampling period *T*), the state of the electrical load transitions from one stable state to another stable state. When the load state changes, the power of the load also changes. So, the change of load state can be identified by the step characteristic of active power. We set _{s} as the mutation amount of state change and *H* as the threshold value, and the calculation formula is shown in formula (1). When _{S} exceeds the threshold *H*, it can be considered that a load event occurs at time *t*.

Due to the nature of the industrial load, low-frequency sampling is adopted for identification. First, the active power data are read into the database through professional acquisition equipment, and then the active power changes are monitored in real time. Finally, when the active power changes meet formula (2), the load status changes. We can also use the steady-state electrical component difference method to extract the electrical features. As shown in Figure 2, the current and active power change after the load status changes.

**(a)**

**(b)**

##### 3.2. Extract Industrial Load Characteristics Based on Steady-State Waveforms

When event *A* is detected, we need to extract the load waveform. Usually, the steady-state waveform can characterize the basic state of the load during normal operation. In this paper, the steady-state waveform of the event is extracted as the event waveform for analysis. Therefore, it is necessary to judge when the event is in a steady state before and after the occurrence. When there is no significant change in active power for multiple cycles, the current signal is considered in a stable state.

The fundamental phase angle of the steady-state current is determined by the initial phase of the voltage during measurement. It is guaranteed that the steady-state current can be measured under the same initial phase angle voltage to meet the current superposition. Therefore, when the current is judged to be in a steady state, the waveform can be extracted by detecting the zero-crossing point of the corresponding voltage. The voltage corresponding to the steady-state current *I*_{m − 1} before the event is *U*_{m-1}. *U*_{m − 1} sampling point *j* corresponds to *U*_{m − 1},_{,j}. When *U*_{m − 1}_{j} satisfies formula (3), this point is considered to be voltage crossing zero of the steady-state voltage waveform *U*_{m − 1} before the event occurs.where *j* is the serial number of the sampling point corresponding to the voltage zero-crossing point showing an upward trend. Starting from the moment corresponding to the zero-crossing point, one cycle can be intercepted to extract the steady-state current *I*_{m − 1} before the occurrence of event *A*.

In the same way, the above method is used to detect the voltage zero-crossing point of the steady-state voltage waveform *U*_{s} after the occurrence of event *A*, and then the steady-state periodic current *I*_{s} is extracted. According to the principle of current superposition, we use formula (4) to obtain the event current waveform *I* of *A* and the corresponding voltage waveform *U* of *A*.

The detailed description of steady-state current waveform characteristics can provide an important basis for load identification. Therefore, the feature extraction of the steady-state current of the load in this paper includes the peak, peak valley, root mean square, and standard deviation of the steady-state current. At the same time, the steady-state current waveform is used as an industrial load imprint to identify various loads. Figure 3 shows the steady-state waveform characteristics of industrial loads.

#### 4. Industrial Load Identification Based on the Random Forest Algorithm

Random forest is characterized by strong antinoise and randomness [26]. It is an ensemble learning model with good predictive ability. Random forest is a kind of combined classifier, which uses a decision tree as the base classifier, which reduces the limitation of the single classifier to a certain extent and can achieve higher prediction accuracy. At the same time, the randomness of the random forest itself can tolerate the effects of abnormal points and noise to a greater extent, thereby reducing the overfitting phenomenon in the decision tree algorithm to a certain extent and improving the generalization ability. In this paper, the specific steps of the process of industrial load identification based on the random forest are as follows, as shown in Figure 4.

##### 4.1. Industrial Load Sample Dataset

The collected steady-state current waveform data are preprocessed by a linear regression algorithm to remove abnormal values and irrelevant data. Then, the current waveforms are grouped into *k* pieces to form a sample dataset.

##### 4.2. Bootstrap Algorithm

Bootstrap is a sampling method with replacement. First, a certain amount of sample data needs to be collected, then the sample after each sampling needs to be put back into the sample dataset, and the next sampling can be selected again. Finally, after sampling the sample data *N* times, *N* training sets are obtained, and each training set can form a decision tree, as shown in Figure 5.

##### 4.3. Based on the CART Decision Tree

Random forest is a combination classifier that uses a decision tree as the base classifier. The decision tree is the process of continuously splitting the root node to generate leaf nodes. The leaf nodes represent the classification results, and there is no need to continue splitting [27]. Each training set generates a decision tree according to the CART algorithm. In the process of producing the decision tree, *m* (*m* ≤ *M*) feature variables are randomly selected from the total *M* attributes in the data sample set, and the CART algorithm is used to calculate the *m* features which obtain an optimal attribute, proceed to the next step of splitting the attributes, and construct *N* decision trees accordingly, as shown in Figure 6.

This paper adopts the CART algorithm and selects attributes based on the principle of minimum Gini coefficient to split the binary tree of nodes. The exponential formula of the distribution is as follows [27]:where *k* is the total number of types of feature samples in the node and *p*_{k} is the probability of the *K* type of feature samples in the node in formula (5).

The Gini index in sample set *D* can be calculated using the following formula:where *D*_{k} is the subset of samples belonging to the *K* category in the sample set *D*.

The Gini index involves the binary division of different attributes. If a certain attribute *A* is a discrete value, *A* needs to be divided into two parts *D*_{1} and *D*_{2}. The Gini index divided by each node can be calculated by the following formula:

*D*_{1} and *D*_{2} are two subsets of the sample set *D* divided into. Therefore, the impurity reduction of *A* is as shown in the following formula:

It can be seen that, in each binary division, when the Gini index is the smallest, this attribute will be selected. Therefore, it is necessary to reduce its impurity, larger and smaller, so that the splitting effect of attribute *A* will be the best.

##### 4.4. Identification Results Based on the Voting Method

The dataset to be tested is input into the *N* decision tree models that have been trained, and the decision tree model calculates each type based on the parameters trained in the sample dataset. The results obtained by *N* decision trees are voted to obtain the final result. The highest proportion of results in the decision tree is the result of industrial load identification.

#### 5. Case Analysis

In this experiment, the Henghe DL850 collection device was used to collect industrial load operating data on the low-voltage side of the bus side for load analysis and identification research. The algorithm is run on a computer with Intel Core i5-3470, 4 G memory, CPU clocked at 3.2 GHz, and a 64 bit operating system. The operating environment used is Matlab2019a. Compare by setting different decision trees, record the running time of the program, and take the average result of 20 tests for each type of decision tree.

##### 5.1. Data Collection

After the device starts and stabilizes, it starts to collect voltage and current data by the method of steady-state waveform extraction, the sampling frequency is set to 10 kHz, and the acquisition time is set to 20 min. The fundamental frequency of the domestic load is 50 Hz (the period is 0.02 s), 200 points are sampled in one cycle, and 60,000 cycles are sampled in 20 min. The sampled load types include motor A, motor B, heater, electric lamp, the specific collection load combination type, and label which are shown in Table 1, and the collected waveform is shown in Figure 7.

##### 5.2. Formation of Sample Data

To improve the quality of the sample data, the collected data are imported into MATLAB data using XView software, and the sample data are obtained after preprocessing. The main steps are as follows:(1)XView software decomposes the sampled data to obtain data containing time, current, voltage, and other information, which is the decomposition process of the collected signal.(2)Process in MATLAB software, intercept 10,000 stable cycle data, and convert them to 10 cycles per line, 1000 lines of data.(3)Extract the single-phase current signal for load identification and analysis. Therefore, extract the sample data of each load with a row of 10 cycles, a total of 2000 points, and a matrix of 1000 × 2000 in size.(4)Choose 700 sets of signals as test data and the remaining 300 sets of signals as training data.(5)The combined training data are 5600 groups, the test data are 2400 groups, and the random forest algorithm is used to train and test the sample data.

##### 5.3. Experimental Results and Analysis

Using the random forest model algorithm to increase the number of decision trees, the experimental results are shown in Table 2 and Figure 8. When the number of decision trees is more than 15 trees, the load identification accuracy rate has reached 100%, and the average running time is only 3.36 s.

The random forest algorithm with 15 decision trees is compared with the traditional Bayes algorithm and *k*-adjacent algorithm, using the same sample set, and the running results are shown in Table 3. The recognition rate and running time calculated by the random forest model have exceeded the Bayes algorithm, which is suitable for the needs of industrial load identification.

From Tables 2 and 3, it can be seen that the industrial load identification based on the random forest has a good effect. The average identification rate of the sampled 8 kinds of power loads reaches 100%. Compared with other algorithms, the industrial load identification rate based on the random forest has certain superiority.

#### 6. Conclusion

The industrial load identification algorithm proposed in this paper is based on the random forest and steady-state waveform. The random forest algorithm has a good prediction and classification effect, but it has not been applied to this field. The random forest algorithm improves the accuracy and efficiency of industrial load identification on the whole. The algorithm in this paper mainly uses current waveform as the recognition condition to classify and identify the electricity information of the industrial load in a nonintervening way. The electrical characteristic information is obvious, and the extraction process is simple. The random forest algorithm has good randomness and classification effect and high classification efficiency. The simulation results show that the identification of the industrial load based on the random forest model can overcome the difficulty of industrial load information collection, and the recognition rate reaches 100%, which is better than other classification algorithms in terms of accuracy and running time. Noninvasive industrial load identification based on the random forest and steady-state waveform can be further applied to other large factories and homes to improve identification accuracy and operational efficiency. However, the identification method of the continuous state load model is not considered, which needs further study.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request ([email protected]).

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.