Abstract

Piecewise-linear functions can approximate nonlinear and unknown functions for which only sample points are available. This paper presents a range of piecewise-linear models and algorithms to aid engineers to find an approximation that fits best their applications. The models include piecewise-linear functions with a fixed and maximum number of linear segments, lower and upper envelopes, strategies to ensure continuity, and a generalization of these models for stochastic functions whose data points are random variables. Derived from recursive formulations, the algorithms are applied to the approximation of the production function of gas-lifted oil wells.

1. Introduction

Piecewise-linear functions are widely used to approximate functions for which only sample points are known and to model nonlinear functions. In petroleum engineering, the fluid flow from an oil well and the pressure drop in a pipeline can be approximated with a piecewise-linear function [1]. In optimization, nonlinear problems can be recast as a mixed-integer linear programming (MILP) problem, which is then solved with MILP algorithms [24].

Although many works in the literature are dedicated to piecewise linearization, few of them address the problem of generating piecewise-linear models which is often delegated to the scientist or engineer. Some techniques are based on point clustering, such as -means, which has found applications in nonlinear dynamic systems and control [5]. This work attempts to mitigate undesirable effects of clustering, particularly the influence of outliers and the synthesis of subpotimal approximations. Akin to the previous technique, a fuzzy clustering strategy is developed in [6] to represent a nonlinear system with a piecewise-linear model. However, fuzzy clustering is dependent on the initial parameters and can reach a local minimum. In order to reduce the chance of yielding a local minimum, genetic algorithms have been proposed for clustering [7]. Other works rely on local searches that introduce linear segments iteratively until satisfying a stopping criterion [8, 9].

However, all of the cited approaches do not offer a certificate of optimality and do not allow the user to specify desirable properties for the resulting approximation. To this end, this paper develops formal models and algorithms for generating piecewise-linear functions that fit best the needs of the end user. The models consider a fixed and maximum number of linear segments, lower and upper envelope approximations, constraints to ensure continuity, and stochastic generalizations of these models in which the data points are random variables. The underlying approximation problems for these models are cast in a recursive form that can often be efficiently solved with dynamic programming strategies. To the best of the knowledge of the authors, this is the first work on the synthesis of piecewise-linear functions based on formal models and algorithms which considers an extensive range of conditions and approximation metrics. Software tools are obtained from these strategies to assist scientists and engineers in the synthesis of piecewise-linear approximations of nonlinear models and processes.

Sections 2 through 5 develop a range of models and algorithms for piecewise-linear approximation. Section 6 reports computational results on the approximation of well production in gas-lifted oil fields. Section 7 offers a summary and suggests directions for future research.

2. Basic Models and Algorithms

Let be a set of points of a function such that , , and . may represent any function, hereafter referred to as generating function, which can be known explicitly, implemented by a simulator, or obtained from a real-world process such as the output flow from an oil well as a function of the compressed gas injection. The problem is to find a set of linear segments that best approximates . Assuming that the segment breakpoints are points in , a piecewise-linear model is a sequence of breakpoints such that(i);(ii)each subset is approximated by the linear segment for .

An approximation for is obtained by minimizing the squared error between the linear segment and the points in , for which an analytical solution yields the intercept and the slope . Table 1 presents a set of sample points (the sample points were obtained in an arbitrary manner by perturbing the oil production function of an oil well induced by the lift-gas injection rate ; the purpose of these sample points is only to illustrate the behavior of the models and algorithms to be developed in this work) that will serve to illustrate models and algorithms.

2.1. Fixed Number of Linear Segments

To find the piecewise-linear approximation with a fixed number of linear segments that minimizes the squared error, the following problem is solved:This problem can be recast in a recursive form and solved with a dynamic programming (DP) algorithm. For an introduction to dynamic programming in discrete domains, the interested reader can refer to the text books by Kleinberg and Tardos [10] and Cormen et al. [11]. The book by Bertsekas [12] is recommended for an advanced treatment of DP with particular emphasis on continuous domains. Let be the minimum cost to approximate the points with exactly segments. Then,where is the minimum squared error of approximating with just one segment. Notice that , for all , and for all . From recursive equation (2), clearly .

Theorem 1. Recursive equation (2) yields the piecewise-linear approximation of of minimum squared error having exactly linear segments.

Proof. The proof is by induction in and . For the basis with , is the optimal cost since the approximation of one point is undefined for all , and, with , for all . For the induction, take any and . Any approximation of with segments must approximate with segments and with one segment. The point is chosen to minimize the aggregated cost and the cost to approximate with one segment. Since is by induction the minimum cost to approximate with segments, is the minimum cost.

The optimal sequence is obtained recursively: ; is the index that minimizes in (2); is the index that minimizes ; and so forth until is reached, at which point . Algorithm 1 outlines the DP algorithm PL-Fixed for computing . The algorithm records in table the indices that minimize (2), which are the optimal breakpoints.

(1)  for   to   do
(2)  ;
(3)  for   to   do
(4)     ;
(5)    ;
(6)    ; ;
(7)    ;
(8)  for   to   do
(9)   for   to   do
(10)  ; ;
(11)  ; ;
(12) for   to   do
(13)  ; ;
(14)  for   to   do
(15)   if    then
(16)    ;
(17)    ;

PL-Fixed is correct because it solves recursive equation (2). It runs in time since . The notation , such that for all represents the set of all nondecreasing functions that grow at the same asymptotic rate of . Therefore, the time taken by algorithm PL-Fixed to solve an instance of size is approximately . A precise and extensive discussion on the running-time complexity of algorithms is found in [11].

Running the algorithm on the sample instance with yields the values and depicted in Table 2. The optimal sequence is recursively obtained from the table as follows: ; ; ; and , as indicated in boldface. The optimal sequence is according to which the first segment approximates the points , the second segment approximates , and the third segment approximates . The approximation cost is . Figure 1 depicts this approximation.

2.2. Maximum Number of Linear Segments

A variation of the previous model is obtained by limiting rather than fixing the number of linear segments. Each segment incurs a cost to trade off model simplicity and quality of approximation. This problem is also cast in a recursive form, with being the minimum cost to approximate with at most segments. Formally,where is defined as above, , and for all .

Theorem 2. Recursive equation (3) yields the piecewise-linear model of with at most segments, whose objective is the combination of the approximation error and a cost per segment.

Recursive equation (3) leads to a DP algorithm PL-Max, which is omitted here in the interest of brevity. PL-Max() runs in provided that . This algorithm records the index corresponding to the optimal breakpoint of (3) in a table . This table is key to determine the optimal number of segments and construct the optimal sequence .

The optimal number of segments and sequence of breakpoints are determined from the tables produced by PL-Max() as follows: ; is the index that minimizes in (3); is the index that minimizes ; and so forth until reaching the first breakpoint where for some . Then, and the optimal breakpoint sequence is .

With a limit and cost , PL-Max produces an optimal approximation that uses 4 out of the 5 segments available, inducing a total cost of . The optimal sequence is with four segments approximating the point sets , , , and .

3. Continuous Approximations

The preceding models may produce discontinuous piecewise-linear functions. To enforce continuity either because the actual function is known to be continuous or because continuity is desirable for numerical optimization, some adaptations are proposed for the basic models.

3.1. Ensuring Continuity at the Breakpoints

Continuity is enforced by approximating the set with segments going through and . The recursive formulations ensuring continuity with a fixed and maximum number of segments are identical to the preceding ones, except that is approximated with the segment passing through and . The theorem below is a consequence of this observation.

Theorem 3. Suppose that the coefficients of the linear segment approximating are and . Then, recursive formulations (2) and (3) produce optimal piecewise-linear approximation functions for that are continuous at the breakpoints using exactly and at most linear segments, respectively.

Applying PL-Max to the sample instance with a maximum of segments and cost , but forcing the segments approximating to go through the end points, yields the piecewise-linear function with segments, which is illustrated in Figure 2. This approximation is given by the sequence which, coincidentally, is identical to the one obtained without ensuring continuity. However, its approximation cost exceeds the optimal cost of the discontinuous approximation, which is expected because a discontinuous model encompasses its continuous counterpart. This property is formally stated by the proposition below.

Proposition 4. The approximation cost induced by the schemes that enforce continuity at the breakpoints, with a fixed and maximum number of segments, is not lower than the cost induced by the schemes that do not enforce continuity.

3.2. Rejecting Discontinuous Approximations

The basic models with fixed and maximum number of segments can be modified to ensure continuity by rejecting discontinuous approximations. The recursive equation for generating a piecewise-linear function with a fixed number of segments iswhere , the lines approximating and intercept at point , and if . Further, , for all , and .

Algorithm 2 gives the dynamic programming solution of recursive equation (4), for a fixed number of segments. The algorithm records in table the points where the linear segments intercept, which are used to determine the breakpoints of the piecewise-linear function. The algorithm runs in time provided that .

(1)  Same as lines () through () of PL-Fixed
(2)  for   to   do
(3)  for   to   do
(4)     ; ; ;
(5)  ; ; ;
(6)  for   to   do
(7)  ; ; ;
(8)  for   to   do
(9)    ;
(10)   if   and   then
(11)     ;
(12)     if   and   then
(13)      ; ; ;

Algorithm 2 applied to the sample instance with yields as the optimal sequence. However, the breakpoints are not those of the sequence but defined by the intercept of the segments: the first segment is defined for , the second for , and the third for . Notice that and as expected. The approximation cost is , which exceeds the cost of the discontinuous approximation.

3.3. Generating Continuous Approximations

Continuity may be ensured by solving a least squares problem at the moment that the recursive equation is evaluated. For a fixed number of segments , recursive equation (2) becomeswhile for a maximum number of segments (3) becomeswhere is obtained by solving the following problem:where and are the slope and intercept of the rightmost segment of the optimal approximation of with exactly or at most segments, depending on which recursion is being solved. if the problem is infeasible, for example, when .

The computation of entails minimizing the squared error of the approximation of with a linear segment, while ensuring that this segment intercepts the preceding one, the rightmost segment of the piecewise-linear approximation of , in the interval . The chief difficulty lies in the nonlinearity of (7b). However, the optimal solution is obtained by solving two quadratic programs. First, notice that variable is eliminated by replacing the constraints (7b)-(7c) with the following nonlinear inequality:which, for , is transformed into the linear inequalities and the following ones when :The end result is that can be computed by solving two convex, quadratic programs (QPs) and choosing the least costly solution. The QPs minimize the objective (7a), with the first being subject to constraint (9a), whereas the second is constrained by (9b).

Dynamic programming algorithms are obtained to solve recursive equations (5) and (6) using a standard QP solver. The algorithms run in time, with being an upper bound on the cost for solving QPs with variables and 3 constraints. The application of the DP algorithm for the sample instance with a fixed number of segments yields the sequence . The breakpoints of the segments are defined by their intercepts, which are for the first segment, for the second, and finally for the third. The approximation cost is .

4. Lower and Upper Envelopes

The preceding models can be modified to obtain piecewise-linear lower and upper envelopes of the generating function . For the upper envelope, the slope and intercept should minimize the squared error of the line approximating , while keeping these points below the line. Formally, this is achieved by solving the QP:

The basic algorithms can compute the optimal upper envelope, with a fixed and maximum number of segments, by solving to obtain the coefficients of the segment approximating . The theorem below formalizes these results for the upper envelope approximation. Algorithms are developed similarly for the lower envelope by changing the direction of inequality (10b).

Theorem 5. Recursive equations (2) and (3) yield the optimal piecewise-linear upper envelope approximation of with exactly and at most linear segments, respectively, having the squared approximation error as the objective, provided that the coefficients for the linear segments are computed by solving problem given by (10a) and (10b).

The running time of the resulting algorithm is where is an upper bound on the cost to solve the quadratic program with variables and constraints.

Algorithm PL-Fixed applied to the sample instance, with the coefficients obtained by solving and a fixed number of segments , yields the optimal sequence with cost . The slopes and intercepts of the segments are for ; for ; and for .

The models and algorithms presented in Section 3 can be modified to produce continuous lower and upper envelopes. For PL-Fixed-Cont-Reject, it suffices to use the slope and intercept obtained by solving . This algorithm applied to the sample instance with yields the optimal sequence with cost . The domains of the linear segments are for the first segment, for the second, and for the third. This approximation is illustrated in Figure 3.

For the algorithm that computes linear segments by solving QP problem (7a), (7b), and (7c), we only need to introduce the linear inequalities that ensure bounding, such as the constraints (10b) for an upper envelope approximation.

5. Dealing with Random Data

Uncertainty is addressed by modeling each point as a random variable characterized by a probability density function . Two models are proposed, one minimizing expected squared error and the other using chance constraints [13].

5.1. Minimizing Expected Squared Error

Uncertainty may be treated by minimizing the expected squared error of the piecewise-linear approximation. For the points in the set , this approach finds a linear segment given by the slope and intercept that solves the problem: where is the domain of , is the joint probability density function for , and is the expected error incurred by when approximating with a linear segment.

The function is convex on the variables since the term is convex for fixed , generalizing an infinite weighted sum [14, Sec. 3.2.1]. This means that is a convex function and so is problem . Hence, an optimal solution to can be found with numerical optimization methods, such as the interior-point method, using numerical differentiation procedures or directly when a closed form is known for the gradient.

The basic algorithms of Section 2 yield the optimal approximation, for a fixed and maximum number of segments, by solving and using the obtained coefficients for the linear segment approximating . This result is formalized in the theorem below, which can be easily proven.

Theorem 6. Recursions (2) and (3) yield the optimal piecewise-linear approximation of a set of random points with exactly and at most segments, respectively, having the expected squared approximation error as the objective if the coefficients for the linear segments are obtained by solving given by (11a) and (11b).

The resulting algorithms run in time, with being an upper bound on the running time to solve with variables and integration terms.

A sample instance is obtained by defining and as independent random variables, both characterized by Gaussian distributions with mean and and standard deviations and . Table 3 shows the statistics of the random points of the stochastic sample instance. Thus, the joint probability density function is

With a fixed number of segments and , PL-Fixed produced the optimal sequence with an expected approximation cost . This piecewise-linear function differs from the one with zero variance, for which the optimal sequence is with cost .

5.2. Lower and Upper Envelopes

Instead of minimizing only the expected squared error, approximations for the lower and upper envelope can be obtained by introducing probabilistic constraints to problem . An algorithm for the lower envelope is developed below, which is easily adapted for the upper envelope.

To find the lower envelope of , the constraint that the random point is above the linear segment , , is imposed with a given probability . Mathematically,where denotes the probability operator. The DP algorithms can find the lower envelope by changing how the slope and intercept of the linear segments are computed. The problem of finding the segment coefficients that minimize the expected squared error, subject to the probabilistic constraints for the lower envelope, is while being subject toBy expressing probabilistic constraint (14c) in terms of integrals and the joint probability density function , a deterministic equivalent to this problem is obtained: while being subject towhere is the probability of lying below the linear segment that approximates . This problem may be posed in terms of the probability of being above the linear segment.

Problem is a nonlinear program for which algorithms reach a locally optimal solution, such as sequential quadratic programming (SQP).

The basic algorithms can find the optimal approximation, for a fixed and maximum number of segments, by solving and using the obtained coefficients in the approximation of . This result is formalized in the theorem below, whose proof is similar to that of Theorem 1.

Theorem 7. Recursions (2) and (3) yield the optimal piecewise-linear approximation of a set of random points with exactly and at most segments, respectively, having the expected squared approximation error as the objective and subject to chance constraints to probabilistically ensure the lower envelope, if the coefficients for the linear segments correspond to the optimal solution of problem .

Algorithm PL-Fixed was applied to the stochastic sample instance given in Table 3, with a fixed number of segments and probabilities and . The problems were solved approximately using an SQP algorithm. For , the algorithm produced the sequence for the probabilistic lower envelope. The expected cost of this solution is , above the expected cost of obtained for the solution without chance constraints. For , the algorithm produced the sequence with an expected cost of .

6. Computational Analysis

In order to keep the analysis brief, the experiments were restricted to the following approaches:(1)PL-Max, the basic model of Section 2.2 which enforces a maximum number of linear segments.(2)PL-Max-Cont, the version of the basic model that imposes continuity by solving quadratic programs according with the developments of Section 3.3.(3)PL-Max-Bound, the version of PL-Max which minimizes the number of linear segments while ensuring a bound on approximation error.

The algorithms were implemented in C++ in a workstation equipped with an Intel i7 CPU, 2.67 GHz, with 8 GB of RAM, and running the Linux operating system version Ubuntu 2.6.35-32.

The CFSQP software package and its embedded QP solver were used to solve optimization problems [15]. CFSQP follows the sequential quadratic programming (SQP) framework which solves a sequence of QPs approximating the original problem but having the Lagrangian as the objective to capture the curvature. The gradient of the objective and constraints were provided analytically. The Hessian of the objective was computed numerically using the BFGS strategy.

The approaches were put to the test in the piecewise-linear approximation of the production function of gas-lifted oil wells. As an oil field matures, the reservoir internal pressure reduces to a level that is not sufficiently high to sustain natural production, requiring the application of artificial lifting techniques. Among these techniques, gas-lift is widely used by the industry for its robustness and relatively low operating costs [16]. Gas-lift works by injecting high pressure gas at the bottom of the production tubing, reducing the fluid density due to gasification and thereby lifting the fluid to the surface.

For the computational analysis, the generation function corresponds to the total mass flow of oil, water, and gas given as a function of the lift-gas rate injected in an oil well . The relation is referred to as well production function. This work adopted the oil production system presented in [4], which consists of subsea gas-lifted wells, pipeline-risers, and topside processing equipment such as compressors and separators. The production system is modeled in Pipesim, a commercial multiphase flow simulator widely adopted by the oil industry.

A detailed piecewise-linear model was obtained by uniformly sampling the simulation model of for a sufficiently high number of values. In our experiments, 50 breakpoints produced a highly accurate and detailed description of . However, simpler and sufficiently accurate piecewise-linear approximations can be obtained by a better distribution of the breakpoints within the function domain, a task to be performed by the algorithms proposed in this work.

Table 4 shows the running time of the three implemented strategies for piecewise-linear approximation of 10 oil-well production functions, each originally represented with 50 breakpoints. For PL-Max and PL-Max-Cont, the maximum number of segments was set to and the cost per segment to . For PL-Max-Bound, and the approximation bound was . The CPU times are consistent with the time complexity of the algorithms: PL-Max is faster than PL-Max-Bound due to the extra constraints of the latter; in turn, PL-Max-Bound is faster than PL-Max-Cont because he latter needs to solve QP problems.

The behavior of the PL-Max strategy is shown in Table 5 for a varying cost and a maximum number of segments . The algorithm was applied to the production function of well #1.

The squared error grows as increases, corroborating the theoretical behavior.

The behavior of PL-Max-Cont strategy is illustrated in Table 6 for a varying number of maximum segments. The algorithm was also applied to the production function of well #1. As expected, the squared error decreases as the limit increases.

The computational experiments show that piecewise-linear functions respecting the desired criteria can be obtained with fewer breakpoints than the original representation. Such simpler piecewise-linear models can have a major impact on the production optimization of large oil fields [17].

7. Summary and Conclusion

Applications of piecewise-linear models abound in science and engineering. In petroleum engineering, the performance curve of an oil well can be approximated with a piecewise-linear function. Owing to their importance, this paper developed models for piecewise linearization of nonlinear functions that trade off model complexity and approximation quality. The models and algorithms serve as decision-support tools for engineers to find piecewise-linear models that fit best their needs.

Several models were developed for the synthesis of piecewise-linear approximations under a variety of conditions. Given a potentially large set of graph points generated by a function , not necessarily known explicitly, the problem consists in finding a reduced subset of points that induces a more compact, yet sufficiently accurate piecewise-linear approximation of . To this end, a number of criteria were proposed in order to formally state the approximation problems for which algorithms have been designed. The main contributions of this paper consist of the following formulations and algorithms for piecewise-linear approximation:(1)A model and dynamic programming algorithm for finding the optimal approximation using a fixed number of linear segments, which minimizes a quadratic error with respect to the point set .(2)A model and DP algorithm to find the optimal approximation using a maximum number of segments, which minimizes the quadratic approximation error and a fixed cost per segment.(3)Variations of the two preceding models and algorithms, referred to as basic approaches, to ensure a continuous piecewise-linear approximation.(4)Variations of the basic approaches that yield optimal upper and lower envelopes for .(5)Extensions of the preceding approaches to handle uncertain data, whereby the points of are random variables.

The developed approaches can be combined or extended to create models and algorithms that are tailored to the user’s needs, such as the synthesis of a piecewise-linear function approximating the upper envelope, with continuity and in the presence of random data points. The proposed models measured error with the -norm; however other norms can be easily adopted such as the - and -norm by solving suitable linear and quadratic programs.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work was supported in part by CNPq under Grant 302972/2013-7.