Abstract
Conventional fault diagnosis and production calculation of an oil well can be conducted with the surface dynamometer cards, which are obtained by load sensor installed on the horse head. This method to measure the dynamometer cards is limited by the sensor maintenance and calibration, battery replacement, and safety hazards for staff. As the basic parameter of the oil extraction industry, electric parameters have the advantages of low cost and high efficiency. So the inversions of dynamometer card with electric parameters are attracting more and more attention. In order to solve the problem of insufficient data and consider the realtime performance in the actual oil extraction process, this paper proposes a novel hybrid model which consists of two parts: the mechanism model of polished rod load and the suspension displacement calculated with the space vector equations of motor and a datadependent kernel online sequential extreme learning machine (DDKOSELM) model proposed to correct the output error of the mechanism model, which improves the kernel function selection and makes it realtime. Thus, the highlights of this paper can be summed up in two points: under the circumstance of the bottom dead point detection without sensors, the mechanism modeling of the polished rod load and suspension displacement has been carried out from the perspective of mathematical model of AC motor; a novel datadriven model based on datadependent kernel online sequential extreme learning machine (DDKOSELM) has been proposed to improve the kernel functions selection. The coefficients in the datadependent kernel function are optimized with improved free search algorithm (IFSA). The proposed hybrid model has been applied to a normal working oil well and the prediction results show better accuracy than the pure datadriven model and mechanism model.
1. Introduction
The beam pumping unit plays a predominant part in oil extraction, which is the major artificial lift method and employed by virtually more than 80% oil wells all over the world [1]. As one of the key data to identify the various downhole operation conditions, dynamometer card puts forward an immense influence on the oil well fault diagnosis. It describes the relationship between displacement and load of the polished rod in each stroke and fully reflects the oil exploitation status of the pumping unit. Accordingly, the realtime measurement of dynamometer card has invariably been the major topic of discussion in oil well diagnosis. The commonly used methods of measuring the dynamometer card in actual production are realized by directly mounting load sensor. The existing detection methods of dynamometer card have the following defects:
The installation of the measuring equipment of the dynamometer card requires the unloading of the suspension point, the operation is cumbersome, and there are safety risks.
Generally, the overall dynamometer card measuring instrument uses battery power which requires large amount of daily maintenance. Because the production will be affected when the well is forced to stop working by the replacement of the battery and the installation of the load sensor, so the measured intervals are usually relatively long, and the downhole conditions cannot be reflected in real time. Therefore, the fault diagnosis of the pumping well may not be timely.
The load of the suspension point is measured by the pressure strain gauge which must be calibrated periodically. Besides, the ground dynamometer card tester mainly reflects the underground working condition. It is difficult to fully reflect many important operation indicators such as the balance of pumping unit, the system efficiency, energy consumption, and motor running status. This leads to a single measurement function and low cost performance.
As the significant reference variables which are the direct and realtime reflection of the working condition of the beam pumping system, the induction motor electronic parameters are often ignored by the diagnostic staff. There are some calculation methods for the transformation from motor parameters to dynamometer card in recent years. Li et al. [2] presented a method for drawing the dynamometer card based on the actual input power curve of the motor, but it needs to estimate the location of the bottom dead point and thus decreases the accuracy of the calculation. Qi [3] realized the indirect measurement of the dynamometer card of the pumping unit by combining the rotating speed method with the mechanical characteristic curve of the pumping unit, but the modified engineering formula is used to calculate the electromagnetic torque, which affects the accuracy of the method. Besides, the mechanical characteristic curve of the pumping unit is obtained from the load sensor installed on the polished rod. Zhang et al. [4] detected the dead point position of polished rod by installing two microswitches near the link of the pumping beam and bracket and then got the corresponding relation between the power of the motor and the displacement of the polished rod. Because of the mechanical friction between the beam and the microswitches, the accuracy will also be affected. Guo and Ma [5] established a mathematical model containing useful motor power, polished rod load, and polished rod displacement, which can figure out the relationship between the beam pumping unit output and the load. Luo et al. [6] developed a multiple factor nonlinear mathematical model to analyze the operating performance of the induction motor system and introduced the fieldcircuit coupling equation of induction motors and the mechanical model of the fourbar linkage system. The mechanism models above all have some uncertain environment factors that are hard to precisely describe because of the realtime changes with undulations of the pumping units’ operating conditions. Consequently, the datadriven methods have captured increasingly researchers’ attention. Li and Han [7] proposed a serial hybrid model of motor load torque by establishing a datadriven model of underground friction. Considering the torque factor is zero when the polished rod is at the top and bottom dead centers, Zhang et al. [8] proposed a powerdynamometer card inversion model based on big data technologies by collecting the measured power curves and corresponding dynamometer cards of typical wells under different working conditions within a period of time as the training data for deep belief network. However, in the case of insufficient data, the datadrive model is difficult to achieve excellent performance. After the above discussion, we adopt the hybrid modeling by combining the mechanism model with the datadriven model in order to solve the problem of dynamometer card inversion. Nowadays, hybrid modeling algorithms have been successfully put into use in a multitude of complex industry processes [9–22], and they mostly consist of serial structure and parallel structure. The serial hybrid model firstly performs datadriven modeling for the uncertain parameters, and then the outputs of datadriven model are taken as the input of the mechanism model, while the outputs of the parallel hybrid model are the sum of the outputs of datadriven model and mechanism model, and the datadriven model is built to compensate the error of the mechanism model. The latter is adopted as the method in this paper.
In consideration and summary of above discussion, our main contributions in this paper can be summed up as follows. For the part of the mechanism model, the relationship among voltage, current, and torque in the stationary second rectangular coordinate system was derived based on the mathematical model of AC induction motor. The instantaneous torque of AC induction motor was identified according to the stator line voltage and the line current of the motor, and then the polished rod load was calculated from the motor torque in consideration of the crank weight. Finally, the corresponding relation between the polished rod load and suspension displacement is obtained from the signal of the crank rotation angle. For the part of the datadriven model, the inputs of it are the mechanism and electric motor parameters including stroke, the crank radius, stroke frequency, pulley diameter, unbalanced weight, crank quality, balance block weight, two line voltages and line currents of the stator, motor speed, and other technical parameters, and the outputs are the error estimation between the actual measured values and the outputs of the mechanism model.
2. Process Description
The typical structure diagram of beam pumping units is shown as Figure 1, according to the working principle of oil extraction, the oil pump barrel is placed under the oil surface through tubing, with a standing valve installed on the bottom of it; piston is placed into the oil pump barrel through tubing from rod and with a travelling valve installed on the top of it. The sucker rod is connected to a polished rod suspended from the “horse head”. Under the influence of the crankshaft and four connecting rod of the pumping unit, the motor's rotational motion is turned into a reciprocating movement up and down, and then the pump is driven to extract oil. The load of the beam hanger is called polished rod load, which reflects the load of sucker rod column and the liquid column above the piston. In the upper stroke process, the sucker rod is pulled up to pull the plunger, the travelling valve is closed, the standing valve is opened, the oil is pumped into the pump, and the polished rod load is equal to the sum of the weight of sucker rod and the weight of liquid above the plunger, while in the lower stroke process, the sucker rod pushes down the plunger, the standing valve is closed first, then the travelling valve is pushed open under the pressure, and the pump discharges the oil.
In a stroke, the closed curve mapped out by the load and displacement of the suspension point is known as the dynamometer card, and the area of the curve is the size of the work done by the pump in a single stroke.
When only considering the liquid column static load above the rod string and the plunger surface and the elastic deformation of sucker rod and tubing, regardless of the sand, wax, gas, and interference of other environmental factors, the curve is obtained as the theoretical dynamometer card, which is similar to parallelogram. As shown in Figure 2, line AB is the loading section, which starts at the moment when the plunger reaches the bottom dead point after ascending, while the travelling valve is closed, the plunger pushes the liquid column above it move upward, and the weight is loaded on the polished rod through the sucker rod, elastic deformation of tubing and sucker rod occurs, the tubing is shortened, and the sucker rod elongates. During this period, although the polished rod is moving upward, the plunger relative to the pump barrel does not produce relative changes.
When the load on the polished rod increases enough to pull the plunger, the elastic deformation of tubing and sucker rod ends, the high load BC section starts drawing, at this time the load of the polished rod stays the same, the plunger is pulled upward, standing valve is opened, and the pumped oil takes up the pump space until the top dead point.
In unloading section CD, the plunger reaches the top dead point and starts to go down, the travelling valve is opened and the standing valve is closed, elastic deformation of tubing and sucker rod occurs again, the tubing is elongated, and the sucker rod is shortened. During this time, although the polished rod is moving downward, the plunger relative to the pump barrel does not produce relative changes either.
When the load on the polished rod is reduced to the same as the weight of the plunger and the polished rod, the elastic deformation of tubing and sucker rod ends, the low load DA section starts drawing, at this time the load of the polished rod remains the same, the plunger and the polished rod move downward in a constant speed, travelling valve is opened, and the pumped oil flowed into the plunger from the pump barrel.
It is not hard to conclude from the oil pumping process introduced above that different shapes of the dynamometer cards may cause corresponding changes because of the combined variation of gas, liquid, and machinery.
3. Mechanism Modeling of Beam Pumping Units
3.1. Pumping Motor
In the actual oil production field, the induction motor is employed as the main equipment for the oil pumping motor, and the space vector equation of ac motor can be described as follows [23]:where is the stator voltage space vector, is the rotor voltage space vector, is the stator current space vector, is the rotor current space vector, is the stator magnetic chain space vector, is the rotor magnetic chain space vector, is the resistance of motor stator, is the resistance of motor rotor, is the stator selfinduction, is the rotor selfinduction, is the mutual inductance of rotor to stator winding, and is the angular velocity of rotor.
Then the axis of two phase static coordinates ( coordinate system) is overlapped to the A axis of threephase static coordinate system (ABC coordinate system) and the axis of two phase static coordinates lag axis 90°, and then the electromagnetic torque, synchronous angular velocity, and electromagnetic power in the  coordinate system can be obtained by coordinate transformation as follows:where is the pole logarithm, is the synchronous motor angular velocity, is the mechanical angular velocity, is the electromagnetic torque, and is the electromagnetic power. The line voltage and line current component of the  coordinate system are obtained by the phase transformation of the stator line voltage and line current directly measured in the electrical parameter acquisition process.
Considering the mechanical loss of the motor, the output torque and output power of the motor can be obtained from the following formulas:where is the angular velocity of the rotor slip, is the output torque of motor, is the slip ratio, and is the mechanical loss of the motor.
3.2. The Movement Rule of Pumping Units
The surface structure of the beam pumping unit consists of drive motor, reduction gearbox, and fourbar linkage. For the fourbar linkage mechanism, the fixed rod is the connection between the beam support point and the crankshaft center, and the three movable rods are crank, connecting rod, and backward beam. As shown in Figure 3, is the radius of crank, is the length of the connection rod, is the length of backward beam, is the length of forward beam, is the length of the fixed rod, denotes the mechanical angular velocity, and are reference points, and is the vertical height of the support point to the center of the gearbox. Setting the clockwise direction positive, the motor parameters are collected at 12 o’clock of crank, and denotes the angle of crank spinning clockwise.
The angle between the backward beam and connecting rod can be calculated by applying the cosine theorem to triangle and :
The angle between the crank and connecting rod can be calculated as follows:
The angles of when the horse head is at the top and bottom dead points are obtained as follows, respectively:
Counting from the bottom dead point (12 o’clock of crank), the motor parameters are collected every 40ms, can be calculated by multiplying time by the motor speed, and then the displacement is expressed as a percentage of the stroke:
where is the displacement of suspension point and is the stroke of the sucker rod pumping system.
When the beam pumping unit is working, the torque synthesized on the crankshaft by the suspension point load and the balance weight is in equilibrium with the output torque of motor.where is the torque factor, and are transmission ratio and transmission efficiency of the belt, and are transmission ratio and transmission efficiency of the reduction gearbox, and are the weight of the crank balance block and the crank, is the crank radius, and is the load of the suspension point.
Thus, the relation between the displacement and the crank angle is obtained from (11), and the load of the suspension point can be calculated by (12). Then the curve of dynamometer card can be obtained with variable substitution according to the displacement and load curve of the crank angle .
4. DataDependent Kernel Online Sequential Extreme Learning Machine
It can be observed from the description in Section 3 that there are many factors affecting the accuracy of the mechanism model such as the flux estimation, system power loss, sliding, and viscous frictions. Considering the related data information, we employ the data based regression algorithm to compensate the mechanism model by training the deviation between the output of the mechanism model and the measured value.
4.1. Kernel Extreme Learning Machine
Extreme Learning Machine (ELM) [24] was initially proposed for singlehiddenlayer feedforward neural networks (SLFNS) by Huang and extended to the generalized singlehiddenlayer feedforward neural networks where the hidden layer needs not to be neuron alike. As a supervised algorithm, its parameters are randomly generated, and then the output weights can be analytically determined by the generalized inverse operation. For a set of training samples , there is a nonlinear relationship between the input and the output . This nonlinear projection relationship can be expressed as a linear projection in the feature space according to the model of ELM. The outputs of ELM with L hidden layer neuron nodes are mathematically formulated aswhere and are the parameters of the hidden layer of SLFN and is the weight which connects the th hidden layer node to the output layer node. is the output of the th hidden layer node with respect to the input. The activation functions can be sigmoid as well as the radial biases, sine, cosine, wavelet, and many other piecewise continuous computational functions. The graphic architecture of the basic ELM model with single input and single output is shown as Figure 4.
Then the hidden layer output matrix is written as
The training purpose of ELM model is to find the optimal output weight matrix with the following formula:where denotes the output matrix and is the MoorePenrose generalized inverse of matrix. In order to calculate the norm of the output weights, we minimize the training error as follows:
After solving the quadratic optimization problem above, the output of the ELM is obtained as follows:
If the nonlinear mapping function is unknown to users, the corresponding ELM kernel matrix is given to users as replacement. The kernel matrix for ELM is defined as follows:
Then the output function of kernelbased ELM can be expressed as follows:
Apparently, the selection and construction of kernel functions greatly influence the performance of KELM. Besides, as the volume of datasets increases, the training time complexity of and the kernel matrix size of become an inevitable problem to concern.
4.2. Online Sequential Extreme Learning Machine with Kernels
In the actual oil extraction process, the measured dynamometer cards data is arriving in a streaming fashion in the actual oil extraction process. Therefore, in order to ease the computation complexity of the inverse matrix in (18) and extend the kernel extreme learning machine to the online version, Simone et al. [25] proposed a kernel online sequential extreme learning machine(KOSELM) by straightforwardly extending the kernel recursive least squares to the online sequential extreme learning machine(OSELM) framework.
According to the OSELM, the dataset is divided into successive minibatches , so that . Similar to OSELM, the first approximation of the weight from the first minibatch is computed as follows for KOSELM:
In combination with recursive least squares (RLS), at each new minibatch , the output weights are updated asFor each sample , denotes the error of the filter before the update and is the corresponding coefficient of the center . And all the previous coefficients are updated by a factor .
4.3. DataDependent Kernel
In the traditional kernel ELM for regression, the selection of kernel function mostly depends on experience and the optimization of kernel parameters generally depends on intelligent optimization algorithms such as genetic algorithm, clonal selection algorithm, and particle swarm optimization algorithm. To a certain extent, these methods improve the performance of kernel based learning algorithms. However, the impact of data is not taken into account, the distribution structure of data in the kernel mapping space does not change and the performance of kernel learning algorithm cannot be fundamentally raised to higher level because the kernel function is fixed in different training samples’ structures. Thus, a novel datadependent kernel learning method is proposed to improve the performance by replacing the basic kernel with a datadependent kernel function [26–28]. Through conformal transformation of the basic kernel, the datadependent kernel is defined aswhere is a basic kernel function such as the sigmoid kernel or the Gaussian kernel and the positive value function of , is defined aswhere the whole samples are chosen as “empirical cores” , is the number of samples, represents the combination expansion coefficients of , , and is a free parameter. Thus, the geometrical structure of the data in the kernel mapping space is determined by the expansion coefficients . The kernel matrices corresponding to and are denoted as and , respectively. Consequently, (29) can be rewritten aswhere the diagonal matrix . After defining and , we have
4.4. Kernel Optimization
There are many methods to optimize the combination expansion coefficients and the Gaussian kernel parameter . For two key uncertain parameters, the improved free search algorithm (IFSA) [29] is adopted to address the optimal solution, which is put forward to find the optimal objective function value with the following equations: denotes the position where an individual completes a search step; denotes the search step; denotes the number of individuals; and denotes the search spatial dimension.
As a major parameter of the algorithm, the search radius is adjusted in real time. The initial value of it is set to be 1, and with the growth of the search step, the radius shrinks as (35). In addition, the sensitivity parameter determines the scope of the pheromone value and also confirms the initial search point of the next search round, so the modification of sensitivity parameters is shown as (36) and (37).
Thus, the implementation steps of improved free search algorithm can be summarized as Algorithm 1.

By applying the datadependent kernel to the kernel online sequential extreme learning machine for regression, the training framework of the proposed IFSDDKOSELM error model is summarized as Algorithm 2.

5. Hybrid Model Structure
Based on the above discussion, the parallel hybrid model is constructed in Figure 5, which not only considers the processing technology but also makes full use of relevant data information.
The inputs of the hybrid model are the mechanism and electric motor parameters including stroke, the crank radius, stroke frequency, pulley diameter, unbalanced weight, crank quality, balance block weight, three line voltages and line currents of the stator, motor speed, and other technical parameters. The outputs of the hybrid model are expressed as follows:where load and displacement are the outputs of the mechanism model and is the output of the IFS optimized datadependent kernel extreme learning machine (IFSDDKOSELM), which is the error estimation between the actual measured value and the output of the mechanism model.
6. Case Research and Experimental Results
In this section, the proposed hybrid model is firstly employed to predict the load and the displacement of the suspension point of the beam pumping unit. Secondly, the curves of polished rod load and suspension point displacement with respect to crank angle are drawn by the outputs of the hybrid model respectively. Finally, the dynamometer card is drawn according to the curves of the polished load and suspension point through variable substitution. Taking a normal oil well in Liaohe oil field of China for case research, the technical parameters of the beam pumping unit and the parameters of the induction motor are shown in Tables 1 and 2, respectively.
The data to establish the hybrid model is collected with the indicator instrument, the electric parameters measurement instrument, and two proximity switches. The indicator instrument installed at the suspension point collects the surface dynamometer cards periodically when the beam pumping units are working. The collected dynamometer cards are taken as the reference data and the historical training data for the IFSDDKELM error predictive model. The electric parameters measurement instrument is installed in the electric control cabinet connecting to the induction motor with the sampling interval of 47.8ms, and the collected current, voltage, and power of the induction motor are taken as the input of the hybrid model. One proximity switch is installed at 12 o’clock on the pedestal of the crank shaft to measure the cycle of one stroke, when the crank is rotated to the 12 o’clock position, triggering the proximity switch action, and the timer of the electric parameters measurement instrument starts ticking until the crank is rotated again to 12 o’clock. Therefore, a periodic displacement curve can be obtained. The other one is installed on the motor shaft to measure the motor speed.
The data collection instruments are installed behind the electronic control cabinet (as shown in Figure 6(a)), and it mainly includes the RTU (Remote Terminal Unit), the electrical parameter acquisition board with LCD screen, and the industrial PC (as shown in Figure 6(b)). The RTU is used to receive the surface dynamometer cards collected with the wireless integrated indicator instrument installed at the suspension point of the horse head. The function of the electrical parameter acquisition board is data collection and dynamometer cards inversion; the industrial PC is used to preserve the work data and generate work reports. And the communication mode among the three modules is TCP/IP.
(a)
(b)
1800 groups of measurement data in total three months from Aug 9, 2017, to Nov 10, 2017, including the electric parameters and the dynamometer cards are collected to establish the datadriven model. Randomly select 1200 groups of the collected data as the training samples and the other 600 groups of data as the test samples. Figure 7 shows the suspension point displacement curve in one stroke, Figure 8 is the torque factor curve, and Figure 9 is the predictive curve of the polished rod load obtained by the hybrid model. Consequently, the predictive result of dynamometer card is obtained as Figure 10. The test error of maximum load, minimum load, and the area of the dynamometer card obtained by the hybrid model are shown as Table 3. It can be concluded from Figure 10 and Table 3 that the predictive dynamometer card of the proposed hybrid model is basically consistent with the measured dynamometer card and can reflect the actual working operation vividly.
In order to approve the effectiveness of the proposed hybrid prediction method, the mechanism predictive model without datadriven model and pure IFSDDKOSELM model are chosen to compare the prediction performance, and the prediction performances are shown as Figure 11(a). In addition, combination kernel function LSSVM and the KOSELM in [25] are selected as the comparison algorithms to verify the effectiveness of the IFSDDKOSELM. The prediction effect of the KOSELM in [25] is proved to be better than ELM, KELM, and OSELM. So this paper directly chooses these two methods to compare the prediction performance of the IFSDDKOSELM model. The comparison results are shown in Figure 11(b).
(a)
(b)
Root mean square error (RMSE) and mean absolute error (MAE) are used as the prediction evaluation indicators, which are defined as follows:where denotes the number of test data, is the real value of test data, and is the prediction value. After operating the algorithms introduced above 50 times, the average is selected as the experimental result, which are shown as Tables 4 and 5.
From the prediction performance of Figure 11(b) and Table 5, it can be known that the IFSDDKOSELM model of this paper has the better prediction effect for dynamometer card inversion than the other two datadriven models. Although the LSSVM with combination kernel function and the KOSELM in [25] show good prediction effect, the prediction accuracy is further improved by employing the optimized datadependent kernel function. From the prediction performance of Figure 11(a), it can be observed that the proposed hybrid model has better prediction accuracy than the pure IFSDDKOSELM and mechanism models, which also can be concluded from the corresponding RMSE and MAE with different models shown in Table 4. Notably, because of the stable operating condition of the motor, the fluctuation of the prediction dynamometer card with pure mechanism model is relatively weak.
In summary, the experiment results demonstrate that the proposed hybrid model can perform better dynamometer cards inversion effect than the pure mechanism model and the pure datadriven model. And it can produce precise dynamometer cards for fault diagnosis of the beam pumping unit.
7. Conclusion
In actual oil production, realtime and accurate collection of dynamometer cards has farreaching influence on the oil well fault diagnosis. In view of the insufficient data in the actual oilfield, the data collection is not timely. This paper proposes a novel hybrid modeling for dynamometer cards inversion based on the electronic parameters collected from the motor. The simulation results can basically match with the practical measurement dynamometer cards. The advantages of the proposed hybrid model are summarized as follows.
The mechanism model is carried out to indirectly measure the polished rod load and suspension displacement from the perspective of mathematical model of AC motor. Besides, in the process of data collection, the proximity switch is used to realize the cycle identification without sensor.
In order to compensate the error of the mechanism model output, an online learning with a datadependent kernelbased ELM (DDKOSELM) is firstly presented for regression. The comparison results show that the performance of the datadependent kernel function is proved to be better than the combination kernel function.
Data Availability
As the project will not be finished until December 2019 and we have confidentiality agreement with Liaohe Oilfield, the data could not be released so far. For any information about the article, please contact us via [email protected].
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
Financial support from the National Natural Science Foundation of China under Grants 61573088, 61573087, and 61433004 is acknowledged. The authors are also grateful for the support from the Liaohe Oilfield of China National Petroleum Corporation, providing them with research and experimental conditions.