Abstract

The traditional power load identification is greatly restricted in application because of its high cost and low efficiency. In this paper, the similarity model is established to realize the noninvasive load identification of power by determining the feature database for the equipment. Firstly, the wavelet decomposition method and the wavelet threshold processing method are used to remove abnormal points and reduce noise of the original data, respectively. Secondly, the transient and steady-state characteristics of electrical equipment (active power and reactive power, harmonic current, and voltage-current trajectory) are extracted, and the feature database for the equipment is established. Thirdly, the feature similarity is defined to describe the similarity degree of any two devices under a certain feature, and the similarity model of automatic recognition of a single device is established. Finally, the device identification and calculation of power consumption are carried out for the part of data in annex 2 of question A in the 6th “teddy cup” data mining challenge competition.

1. Introduction

With the emergence of various new types of power load components in an endless stream, users put forward higher requirements on the reliability, safety, economy, and stability of power system. Smart grid emphasizes bidirectional interaction with users and encourages users to participate in power management through demand response, which is inseparable from detailed control of load operation information. Since the traditional invasive load monitoring system costs a lot in time and investment and has a certain impact on the reliability of the system, it is necessary to develop an economical and effective noninvasive load monitoring and identification system. Hence, strengthening the monitoring of building power consumption is of great practical significance for energy conservation and smart grid.

Noninvasive load monitoring technology has attracted much attention from power companies and scientific research institutions since it was proposed. It is worth noting that Hart [1] established the first noninvasive appliance load monitoring system (NIALM) to develop a monitoring tool that does not affect the target or affect the target as little as possible. It can provide power companies with specific power consumption data of different electrical equipment. Li and Yu [2] further carried out research on noninvasive load monitoring and determined characteristic parameters based on fuzzy clustering results of steady-state load characteristics of electrical appliances, so as to realize noninvasive load monitoring based on differential evolution algorithm. Liang et al. [3, 4] researched on a series of studies in the field of load characteristics and comprehensively introduced the basic concept, system structure, feature method, decomposition framework, system simulation application, and other aspects of noninvasive load monitoring. Cai et al. [5] calculated the similarity between the transient waveform and the fixed characteristic template in the electrical load characteristic database, established the electrical load characteristic membership matrix based on similarity, and determined the characteristic type of electrical load. Zheng et al. [6] studied the microcharacteristics of noninvasive load monitoring, established the household load characteristics database, and analyzed the load characteristics and extraction methods contained in the fundamental wave and multiple harmonics of current, voltage, and power but lacked of the specific methods to complete the noninvasive identification of electrical load of users. Huang et al. [7] employed instantaneous current and power waveforms to take the decomposed current waveforms as the characteristic values of two similar loads, which could realize the accurate identification of electrical appliances with similar current waveforms. Wu et al. [8] decomposed the sampling current to obtain the independent current generated by the start-up of electrical appliances and established the load identification algorithm of entropy value discrimination to realize the decomposition and recognition of electrical loads. In practice, the research on nonintrusive power load monitoring and decomposition mainly focuses on the optimization and improvement of electrical load feature extraction and load identification algorithm.

Noninvasive power load decomposition and monitoring refers installing a sensor at the entrance to the grid users, and the device monitors the power consumption and working condition of each or each type of electrical equipment by collecting and analyzing the total power or total current. Hence, power companies can understand the power consumption rules and usage patterns of each or every type of electrical equipment in the user’s home, as shown in Figure 1. The monitoring data of household power load provides a scientific basis for the prediction of load usage in power system and ensures the correctness of decision-making [9]. This paper takes the title A in the 6th “teddy cup” data mining challenge competition as the research background. Firstly, the transient and steady-state characteristics of the electrical equipment are extracted from the original data, the equipment feature database is established, and finally, the similarity model is established to realize the noninvasive load detection of power. The data are available at the teddy cup data mining challenge website.

The data used to support the findings of this study are available at the teddy cup data mining challenge website (http://www.5iai.com/bdrace/tzjingsai/20170921/1253.html#sHref).

2. Data Processing and Establishment of Feature Database

2.1. Data Preprocessing

Table 1 shows the known equipment data and parameters.

2.1.1. Abnormal Points Processing

In this paper, the wavelet decomposition value method is adopted to detect and distinguish abnormal points and mutation points [10]. The specific algorithm is as follows:Step 1. The fitting residuals and were decomposed online based on two wavelet scale.Step 2. The modulus of wavelet decomposition coefficient at two scales was calculated, and the difference value was calculated to obtain .Step 3. Detection of abnormal points and mutation points.

The active power data of YD1-YD11 were tested by the above outlier test method. Figure 2 shows the abnormal point test results of equipment YD4 in the period from 60 seconds to 290 seconds.

2.1.2. Noise Reduction Processing

We perform data noise reduction through wavelet threshold process [11].

Wavelet noise reduction is to separate signal from noise by using the difference of noise in the time and frequency domain, so as to obtain more ideal noise reduction effect.

Let signal is the polluted noise of , and its basic model can be expressed aswhere is noise and is noise intensity.

After wavelet noise reduction, the processed data is obtained and then the waveform is drawn by MATLAB. Based on length, a sampling period of YD1’s cycle data is taken as an example here to give the signal after noise reduction, which is shown in Figure 3.

2.2. Establishment of Feature Database
2.2.1. Transient Feature Extraction

Transient characteristics refer to the characteristics shown when the working state of electrical appliances changes. As shown in Figure 4, the transient power waveform of electrical appliances' start-up is a typical load mark.

The following part is the analysis of the implementation methods and load characteristics of transient characteristics, which contains four noninvasive load monitoring: mean current and root-mean-square, transition time of transient and multiple of impulse power (current) [12].

(1) Mean and Root-Mean-Square. To calculate the mean value of signal , it is necessary to integrate the signal waveform in a period of time:where is the integral time.

(2) Root-Mean-Square. Root-mean-square represents the fluctuation based on mean value of signal. The root-mean-square of signal is used to represent the voltage of alternating current’s waveform, which is defined as

(3) Transition Time. Set the start time of the transient process as and the end time of the transient process as ; then the transition time can be calculated by the following equation:

(4) Multiple of Impulse Power (Current). The formula for calculating the multiple of impulse power (current) is as follows:where is the maximum power in the process of transient switching, is the steady-state average power before the input of electrical appliances, and is the steady-state average power after the input of electrical appliances. Applying the above introduction and the single-state data provided in Annex 1, the obtained characteristics database of transient state is as follows.

As can be seen from Table 2, the change form of electricity load from the opening state to the stable state is various. The pure resistive load enters into the steady state directly from the start, while other loads contain pulse current and the starting time and pulse size are different. And the switching transient state of different load is different, so the transient characteristic can be used to distinguish the electrical equipment.

2.2.2. Steady-State Feature Extraction

The steady-state characteristics refer to the characteristics of the electrical appliances in a stable operation state. In other words, the steady-state characteristics are the results of some characteristics analysis differences between the two stable operation states [13]. This paper will use V-I trajectory, power characteristic, and harmonic matrix.

(1) V-I Trajectory. The shape features adopted by V-I trajectory method mainly include the current span, trajectory area, absolute area, standard deviation of instantaneous resistance, curvature, slope, total area, left and right areas, asymmetry, intersection point, etc. [14]. In order to avoid the influence of voltage and current amplitude differences of different loads on the size of V-I trajectory, it is necessary to normalize the two parameters before comparing the shape features. Using the frequency data provided in annex 1, take the normalized voltage as the abscissa and the normalized current as the ordinate to draw the V-I trajectory curve of some equipment, which is shown in Figures 5 and 6.

As can be seen from the above figure, for resistive loads, such as Joyang hot pot, V-I trajectory is a straight line, while for a load with high harmonic content, such as Midea microwave, V-I trajectory contains at least one intersection point. The two kinds of trajectories differ significantly, so the V-I trajectory can be used as a distinguishing feature of electrical equipment.(1)Current span , which is defined aswhere is the current sequence and and represent the maximum and minimum values of the current sequence.(2)The trajectory area of the normalized V-I trajectory curveThe normalized sequence value is obtained from voltage sequence , which is defined aswhere is the maximum value of the voltage sequence and , are the number of sampling points in a period, and is the number of preset interpolation points.The normalized value is obtained from current sequence , which is defined aswhere is the maximum value of the current sequence and is the current sequence, the maximum voltage point is , the minimum voltage point is , and the trajectory area is , which is defined as(3)The absolute area of the normalized V-I trajectory curve, which is defined aswhere satisfies , , satisfies , .(4)The standard deviation of instantaneous resistance [15], which is defined aswhere is the instantaneous resistance of the n-th sampling point, is the n-th sampling point and represents the normalized voltage value, is the n-th sampling point and represents the normalized current value, , are the number of sampling points in a period, and is the number of preset interpolation points. is the average value of .

According to the size of the power, the working state of the equipment is divided into several gears; the greater the power, the higher the gear. From device 1 to device 11, there are at most five working states, so the working state of the device is divided into five levels. The device data of one-second period is randomly selected from each running state to draw the V-I trajectory. Based on the above steps and the single-state data provided in Annex 1, the V-I trajectory feature database is obtained, and the V-I trajectory feature of gear 1 of each device is obtained (the default line represents that the device does not have this gear).

As can be seen from Table 3, the V-I trajectory characteristics of gear 1 of each equipment, especially the difference between the current span and the standard deviation of instantaneous resistance are relatively large, and the differences of the obtained track are very obvious, so the V-I trajectory characteristics can be used to distinguish electrical equipment.

(2) Power Characteristics. Active power is the total power consumed by the load during operation. If the load is pure resistance, the voltage-current waveform will always be in phase, so there is no reactive component. However, due to the presence of inductive or capacitive elements, there is always a phase shift between the current and voltage waveforms, which produces or consumes reactive power. Active power and reactive power are calculated as follows [16]:where is the effective value of voltage when the power load is running, is the effective value of current when the power load is running, is the power factor angle when the power load is running, and is the number of harmonics.

We draw the images of active power and reactive power of each device on the same coordinate axis and obtain the comparison diagram of active power and reactive power of each device. The comparison diagram of YD1 device and YD9 device is shown in Figure 7.

As can be seen from Figure 7, the active power of YD1 equipment is greater than the reactive power, while the active power of YD9 equipment is not always greater than the reactive power, among which the active power is less than the reactive power during a sampling period, so YD9 equipment is obviously different from other equipment in the comparison of active power and reactive power.

(3) Harmonic Matrix. The harmonic data contains the unique characteristics of different electrical appliances. The harmonic of load voltage or current can be extracted by Fourier transform or wavelet transform and further identified the load. It should be noted that most loads produce even harmonic with small amplitude and odd harmonic with large amplitude. Low harmonic contains a large amount of information [17]. Therefore, this paper selects the 2nd to 11th harmonic data to study. Calculate the amplitude of each harmonic content rate of each device, and obtain the following harmonic feature database. The data in each row is the amplitude of the kth harmonic content rate of each device.

It can be seen from Table 4 that resistive loads, such as incandescent lamps and kettles, produce few harmonic. While nonresistive loads, such as induction cooker, electric fan, produce rich harmonic. It can be seen that the second and third harmonic contents of YD1, YD2, YD3, YD5, YD6, and YD8 are above 90%, but the harmonic contents of YD9, YD10, and YD11 are significantly lower than 90%, which can distinguish these loads.

In this paper, the current variance of harmonic content rate of each device under different working conditions is calculated to describe the variation trend of harmonic content rate of each device under different working conditions. The default value indicates that the gear does not exist in the device. For example, device 1 cannot be switched 4th to 5th gear. The result is shown as Table 5.

As can be seen from Table 5, under the closed state, the variance of harmonic content rate of YD1, YD2, YD3, YD5, and YD6 is greater than other equipment. For one device, such as YD4, the variance of harmonic content rate is firstly small under the closed state, and then the harmonic content rate increases rapidly when switching to the first gear. In addition, the higher the gear shift is, the lower the variance harmonic content rate is, and the harmonic content rate is almost constant. Therefore, the variance of harmonic content rate can be used as the identification basis.

3. Mathematical Model and Application of Automatic Identification of Single Device

3.1. Similarity and Weight Coefficient

To automatically identify an unknown single device, the characteristic similarity of load mark can be analyzed [14]. Domain feature similarity is defined aswhere represents the eigenvector of the unknown device . is the eigenvector of device . The larger the value of , the higher the similarity between the unknown device and the known device .

The similarity of load mark extracted in this paper is divided into four types of calculation, where represents the similarity of transient characteristic of device and device . Similarly, and represent the similarity of V-I trajectory characteristic and harmonic characteristic of device and device .

represents the contrast similarity of the active power reactive power, defined as the image similarity between the active power and the reactive power contrast figure of the two devices. The specific similarity calculation employs the histogram method [18]. Firstly, calculate the histogram of the two images, respectively, and then calculate the distance measure of the two images, the Pap distance is chosen as a measure, which is defined as

Finally, we calculate the contrast similarity between the active power and the reactive power of the device and the device . The total similarity is calculated by weight, and the weight is determined by entropy method [19].

The entropy weight coefficient [20] of each target is expressed as follows:

Through the entropy value method, the weight of each feature similarity is .

3.2. Establishment of the Similarity Model

Load identification model based on similarity is a weighted sum of all kinds of feature similarities to obtain a total similarity of load feature similarity. The specific model is as follows:where is the eigenvector of transient of , is the eigenvector of transient of , —the eigenvector of V-I trajectory of , is the eigenvector of V-I trajectory of , is the eigenvector of harmonic of , is the eigenvector of harmonic of , and is the comparison similarity of active power and reactive power between the device to be tested and the known device.

3.3. Application of Model
3.3.1. Feature Extraction and Recognition of Unknown Devices

By the method of V-I trajectory and harmonic matrix, the feature matching data of unknown device and are extracted as follows.

It can be seen from the characteristic data of Table 6 that when the devices and to be tested are in the first gear position, the V-I trajectory curve caused by the standard deviation characteristic of instantaneous resistance is relatively large.

As can be seen from Table 7, the unknown equipment produces few harmonics, and the unknown equipment produces abundant harmonics. It can be seen that the third and fifth harmonics content rate of nearly 50%, but the harmonic content of is less than 1%.

As can be seen from Table 8, in the closed state, the variance of harmonic content rate of and has little difference. For equipment , firstly, the variance of harmonic content rate is small in the closed state, and then the harmonic content rate slowly decreases when switching to the 1st level, and finally the harmonic content rate continues to decrease when switching to the 2nd level. For the equipment , firstly, the variance of harmonic content rate is small in the closed state, then gradually decreases with the increase of gear switch, and finally remains almost constant.

Through the established model and relevant data, the calculation results of the similarity between the unknown device , , and YD1 to YD11 are as follows.

Based on the result shown in Tables 9 and 10, the similarity between the unknown device and device 8 is the highest; that is, the unknown device is device 8. The similarity between the unknown device and device 9 is the highest; that is, the unknown device is device 9.

3.3.2. Calculation of Real-Time Power Consumption of Unknown Equipment

In the equipment data given in Annex 2, , , and are the measured voltage, current, and power factor, respectively. The specific calculation formula of real-time power consumption is as follows:where represents voltage, represents current, and represents power factor.

According to the above calculation formula and the data given in Annex 2, the real-time power consumption of the unknown device is obtained. There are some data of the real-time power consumption of the unknown device 1.

Table 11 shows partial data of calculation results of real-time power consumption of unknown equipment .

4. Conclusion

Based on the data analysis, this paper firstly uses MATLAB to detect and distinguish the abnormal points and mutation points by using the method of wavelet decomposition value of the original data. Secondly, the data is transformed by wavelet noise reduction, and pretreatment of the sampled data points of each device is completed. Finally, the abnormal point detection results of a certain device are obtained, and the waveform diagram after noise reduction is drawn.

In the process of feature extraction, firstly, the transient characteristic of a single device are extracted by analyzing the preprocessed data, which includes active power, reactive power, harmonic current, and voltage-current trajectory (V-I trajectory). Secondly, the computation and extraction methods of the characteristic values of each load characteristic are given. Finally, the transient characteristic values of the equipment are obtained, containing the V-I trajectory characteristics of gears 1, 2, 3, 4 and 5, the comparison diagram of active power and reactive power of each equipment, the amplitude of kth harmonic content rate of the equipment, and the variance of harmonic content rate of each operating state of the equipment.

In the automatic identification of a single device, this paper identifies any single device by establishing a similarity model. Based on the load characteristics of four types extracted, a similarity-based load identification model is established. Firstly, the feature similarity is defined to denote the similarity degree of any two devices, and the weight coefficient of similarity of each feature is determined by the entropy value method. Secondly, the weighted sum of feature similarity is used to determine the total feature similarity, and the device with the highest similarity is selected to match with the unknown device. Finally, the similarity feature data between the unknown device and devices 1-11 are obtained. According to the calculation results, the unknown device is determined as device 8, and the unknown device is determined as device 9.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

Hongyan Li expresses sincere thanks to all those who have helped in the course of writing this paper, would like to show sincere gratitude to Mr Xianfeng Ding, who has given so much useful advices on writing and has tried his best to improve the paper, and also would like to express gratitude to classmates who offered references and information on time. Without their help, it would be much harder to finish this paper.