Abstract

The technique of operational analysis (OA) is used in the study of systems performance, mainly for estimating mean values of various measures of interest, such as, number of jobs at a device and response times. The basic principles of operational analysis allow errors in assumptions to be quantified over a time period. The assumptions which are used to derive the operational analysis relationships are studied. Using Karush-Kuhn-Tucker (KKT) conditions bounds on error measures of these OA relationships are found. Examples of these bounds are used for representative performance measures to show limits on the difference between true performance values and those estimated by operational analysis relationships. A technique for finding tolerance limits on the bounds is demonstrated with a simulation example.

1. Introduction

The analysis of the performance of a network of devices is important in many areas. Computer systems and industrial manufacturing systems are two examples. The types of networks considered in this paper are operationally connected, queue and server devices. That is, each device is connected in some way with every other device in the network and each device may have a queue assigned to it. Certain information about these types of networks may be obtained using a technique known as operational analysis (OA). Relationships used to estimate performance measures (PMs) of networks may be derived in operational analysis under a few restrictive assumptions. OA is a technique which was originally defined as an aid in computer system performance analysis [16]. It can be an aid in the understanding of system performance in general [7] and is a complementary approach to stochastic analysis used in many networks of servers performance analyzes and in computer programs [814]. Other used or suggested applications for the OA approach include telecommunications [15], E-commerce [16, 17], flexible manufacturing systems [18], and Petri nets [1921]. The performance measures derived are such things as average number of units at a device, average response time, and throughput. The behavior of a single, arbitrary device in a network will be considered.

Two basic principles define the OA approach [2].(1)All assumptions that are made in analyzing the performance of a real system should be subject to direct verification.(2)All variables that appear in any equation which characterize the performance of a real system should be verifiable by direct measurement.

The validity of PM equations developed using these principles can be shown for a particular set of data because they are based on assumptions which can be directly tested by the observation of data produced by the system of interest over a finite period of time.

The most widely used assumption about the data is that of job flow balance, that is, the number of arrivals to a network (for global flow balance) or to a device (for local flow balance) must be equal to the number of departures from that network or device. Also assumed is one step behavior: only one unit may arrive or depart the network or device at a time. Arrivals and completions do not occur simultaneously. The OA approach assumes that devices must have homogeneous service. That is, the service time of a device in a network is independent of the queue length at any device. Homogeneous arrivals are the corresponding condition for the arrival times. Homogeneous routing holds when the routing frequencies of jobs leaving a device are independent of the queue lengths at other devices in the network. Device homogeneity exists when the rate of output from a device is determined only by its queue length. Other assumptions can be invoked as the need arises to derive OA relations [22]. The only requirement is that these assumptions meet the two basic principles of testability given above.

OA assumptions allow for the development of relationships which enable us to determine PMs by collecting only a few types of data, namely, the number of arrivals and departures for each device state and the total time spent in each state [23]. These PMs will be accurate only if the OA assumptions are met and only for the finite time period observed. The accuracy of the assumptions can be measured. A device state is the number of items (customers, jobs, entities, etc.) both waiting and in service at a device.

While OA research was originally proposed as an aid in computer performance analysis, it is more general in that developments can be applicable to any system that generates time series data. This would include computer simulation.

The Abbreviations section gives definitions of variable used in this paper. Error measures of various OA assumptions have been defined and are summarized in Table 1 for job flow balance, homogeneous service, and homogeneous arrival [24]. The limit over time of the expected value and variance of the job flow balance error is zero [25] so that over time this error is not significant for data runs of reasonable length. The expected value over time of other error measures, such as, for the assumptions of homogeneous service and arrival, may not, in general, tend to zero [25].

Different error measures, in the form of relative errors, have been defined by Brumfield [22]. By presenting a set of new assumptions, formulas for the calculation of response time and average queue length in terms of the average and coefficient of variation of service times are developed. Two examples of these new assumptions are homogeneity of queueing and service and homogeneity of residuals. For arrival “forward residual is either the time remaining in the service period during which arrives or zero if arrival begins a service period … similarly backward residual is either the time since the beginning of the service period during which arrives or zero if arrival begins a service period” [22]. Relative error formulas for response time are determined with these new assumptions in addition to the old assumptions of homogeneous service and homogeneous arrivals. Unfortunately, the error terms are quite complex since there are more assumptions with which to deal.

If we are using a relationship to determine a PM derived under OA assumptions, then the resulting value of the PM is in error if the founding assumptions do not hold. This can be checked from the data because of the way OA assumptions are defined. The degree of error in the PM calculated is a function of the assumption error measures. Correction terms have been developed using the assumption error measures [24]. When added to the PMs these correction terms produce exact results. It is these correction terms which are studied in this paper.

As an example, assume we are interested in obtaining a value for the average number of jobs in a computer system that a new job sees upon arrival. If we make the homogeneous service assumption, then this average may be estimated by [4] where is average number of jobs at a device seen by an arriver, assuming homogeneous service, is average number of jobs at the device, and is device utilization.

We are interested in finding a correction term, such that The correction term is equal to where Check the Abbreviations section for all symbol definitions. is the error measure for the job flow balance assumption and is a weak form of the homogeneous service error measure because it may be equal to 0 even when some or all the individual values are not.

There are a couple of problems with using the error measures to derive correction terms. One problem is that the amount of data needed to calculate an exact value for an assumption error measure is the same as to find the performance measure of interest directly. The process of determining the error measure for each assumption used to derive a relation, the correction term, and the PM estimate to which the correction term is added is a way of getting something which may be observed more directly. Finding exact values for performance measures in this indirect way over a finite time period may be worthwhile only if a number of PMs are desired. In this case, a single assumption error measure is determined for each assumption and applied to all the PM correction terms of interest.

Another problem with the error measure technique is that these measures apply only to the data observed. For another run of data, new error measure values need to be found. This limitation may be acceptable if PMs cannot be measured directly without changing the nature of the system, for example, in a complex computer system. We would like a way to extend assumption error measures over longer sets of data and, thereby, say something about the system that generated the data. As Sevcik and Klawe [26] stated shortly after OA was introduced “Because operational analysis is based on assumptions that can be tested but that are very unlikely to be satisfied exactly in any finite time period, it is very important to develop a means of dealing with ‘fuzzy homogeneity’ or situations in which the various independence assumptions are satisfied within some tolerance.” This paper addresses this need to define these assumption bounds.

The next section will illustrate how OA relations may be used to reduce data collection while estimating performance measures. This will be followed by a discussion of the determination of bounds on the OA assumption measurement errors for homogeneous service and homogeneous arrival. Sample calculations of these bounds will be presented afterward. An illustration of the use of bounds in a simulation will then be given.

2. Simplifying Data Collection

Calculating performance measures with OA relationships that are derived under one or more of the system behavior assumptions is usually simpler than using more direct relationships. This is because by making the assumptions a model has been created which reduces, perhaps artificially, the complexity of the behavior of the system. The result is that less information is needed to make an estimate of the PM than would be needed for a direct measurement.

There are situations where it is impractical, if not impossible, to collect sufficient data to determine exact values for PMs over a finite period of time. In some cases, only an estimate of a PM is needed and it is not worthwhile to go to the trouble of determining the precise PM values. Any PM value obtained for a behavior sequence is only an estimate of the underlying system PM. With this realization in mind, it may seem unwise to spend a great deal of effort to obtain an exact value for a sequence which is, in turn, only an estimate of some other value. A good approximation of the sequence estimate may be sufficient.

If we want the average response time, , of a behavior sequence, we could accumulate the response times of all the jobs that go through the system and get the exact by dividing by the number of jobs. A simpler procedure would be to say that response time is where is mean time between completions during busy periods and is utilization.

This equation will give the exact value if we have a behavior sequence for a single server queue which is in flow balance and has homogeneous arrivals and services. If these conditions do not hold, the equation will not give exactly, but an estimate of , call it . If we collect only the idle time, , and the number of completions, , we can use the same equation to find . If the behavior sequence lasts for time , then

Another example calculates the average number of jobs at a device. With the same assumptions as for estimating response time, the average number of jobs in the queue/server system is This value takes even less data to calculate than does . The direct calculation of requires accumulating data every time there is an arrival or completion or requires keeping track of the total amount of time spent at each of the states.

Using these equations for predicting future values for and of a system presents certain problems. For example, over future time, will the assumptions of the system behave in the same way? Since we can determine and use error measures of the assumptions in order to correct assumption derived PM estimate, it is not necessary that assumptions hold in the future if they have not in the past. With the determination of correction terms all that is really necessary is for the correction terms to remain relatively constant, that is, for the system’s violations of assumptions to remain the same over future time periods.

Without knowledge of the assumption error measures and through them the correction terms, the performance measure estimates may be quite bad for any particular behavior sequence [23]. As stated before, in a stable system the job flow balance assumption error measure will go to zero as time increases, but, as shown in [24], this is not necessarily true for other assumption error measures. For any behavior sequence it is important to make some assessment, if possible, of the behavior of the PM correction terms.

3. Performance Measure Bounds

One approach to use the simplified OA formulas for PMs is to determine bounds on the maximum PM error. That is, we are interested in defining bounds on the difference between true values of various PMs at a device for particular state sequences and those PMs estimated by using relationships derived under operational analysis assumptions. We will assume the network is in steady state.

In the following, bounds are found for the assumptions of homogeneous services and homogeneous arrivals. In the case of the job-flow balance assumption, we know that the expected value of the error measure and its variance go to zero [25]: Therefore, as the length of the sequence increases the can be expected to become insignificant.

We will need to assume that a maximum value of the error for services and arrivals for any state is known or can be set. Call these values and for the maximum service error and maximum arrival error, respectively.

3.1. Bounds on Homogeneous Service Assumption Error

If is the maximum error for the homogeneous service assumption, then A more useful bound would be on the weak overall homogeneous service error: But, this limit may be harder to know beforehand. Equation (9) may be used to find an upper bound on by using the definition Substituting (9) yields The term is the average number at the device seen by a completer. Therefore,

The bound given by (14) does not take into consideration the fact that the values are not independent. In fact, they are related by the expression We can get a tighter bound of the values by taking this dependence into consideration. Equation (15) can be shown by substituting the definition as follows:

Since what is desired is an upper bound on a solution to the optimization problem below will give the desired result: In order to show the optimal solution, first put this problem in primal and dual forms:primal: dual:

The optimal solution to the problem will have to satisfy the Karush-Kuhn-Tucker (KKT) conditions, that is, feasibility of the Primal and Dual, as well as complementary slackness [27]. The KKT conditions give the necessary conditions for optimality of the general constrained problem.

Checking dual feasibility, for any the constraints can be satisfied by construction such that if , then , and if , then , . This is because the main dual constraint is

In order to satisfy complementary slackness, if , then it must be true that , and if , then it must be true that . In terms of the above primal-dual construction, if , then and if , then . Any solution , and satisfying the primal constraints and the above two conditions is optimal.

Consider the solution where is the median state at completions.

Assume for simplicity that there is an even number of states so that for any . This solution is dual feasible since we showed above that any is a solution to the dual.

Checking primal feasibility, the solution satisfies   and .

The main constraint is since is median.

Lastly, we check for complementary slackness. Now, when ; then and when ; then . Therefore, complementary slackness holds and the solution is an optimal one.

The solution value is Set this value equal to , which is the overall completer’s average minus the average of the set truncated at the median. This can be shown by first taking times the completer’s average, , is Subtracting yields We know that since is a median. Therefore, is the average of the set of states truncated at the median. So the bound on is

As an example, take the behavior sequence in Figure 1. If we want to use the OA equation [24] to calculate , which is the average number assuming flow balance, homogeneous arrival, and services, we would be interested in the bound of the difference between and . We can calculate ,  and  . Assume the maximum error is . If the other assumption errors are zero, then the difference, is equal to the correction term: The upper bound on this correction term, using (14), is Using the tighter bound, , we get This is a reduction of 57.14%, for this example of the difference bound.

3.2. Bounds on Homogeneous Arrival Assumption Error

As in the previous section, we can assume that a maximum error, , for any state is known beforehand. That is, we assume Then, the weak overall homogeneous arrival error is bounded by where is the average number at the device excluding the maximum state.

As with the service errors, , the are not independent. The dependency is This can be shown by substituting the definition of into the equation. As with the homogeneous service assumptions, we set up the following optimization problem: Using the Karush-Kuhn-Tucker conditions as before, we can show that with as the median state is the solution to the optimization problem. The value of the solution is found by substituting into the primal objective function to get times the average excluding the maximum, , is Subtracting therefore, The expression is the average of the values truncated at the median. Call this value . Then, substitution yields an upper bound on the error measure due to violations in homogeneous arrivals of

This bound may not be as useful as the bound on the homogeneous service assumption error because the value of is based on knowing the values, whereas, for only completion counts are necessary.

4. Example Performance Measure Error Bounds

Some examples are given next of how and may be used to determine bounds on the difference between exact values of various PMs and the OA estimated values for particular behavior sequences.

4.1. Arriver’s Average Queue Length

Using the example from the introduction, for a behavior sequence the average queue length seen by an arriving job may be calculated by Where   and is the average arriver’s queue length assuming homogeneous services.

Rearranging and using the bound give Since for any sequence of data in steady state, we assume flow balance holds. Then, This expression shows us that the difference between estimating the average length seen by arrivers with the relationship that assumes homogeneous servers and the true value of is less than or equal to the bound on .

4.2. Response Time

The exact response time for a behavior sequence can be found by where   and .

Since, as before, we are interested in the difference between a possible observed value () and a calculated value () we should do some rearranging and get . Again, assuming is small and substituting yield

4.3. Average Number at Device

If homogeneous arrivals, homogeneous services, and job flow balance hold then the average number at a device (i.e., those both in queue and in service) can be calculated by The exact average number if these assumptions do not hold is where Rearranging again, and substituting , , and we get if or if .

Because the calculation of requires knowledge of values, there may be little benefit in using the right hand side of the above relationship to find this error bound. If, however, the behavior sequence can be assumed to have homogeneous arrivals, then we can get a bound on the average number at a device error without the knowledge of the individual values. In that case, the only time statistics we need are utilization, , and the fraction of time spent at the maximum state, .

4.4. Throughput

Using the OA version of Little’s Law [1] we can say that the difference between a behavior sequence’s actual throughput and that calculated assuming both homogeneous arrivals and services will be

For , we can substitute which is the more general expression since it does not need the homogeneous arrival assumption. From previous developments we know the following relations hold: where and , where

Substituting into (58) gives If the bound of (60) is substituted into (62), then it must be that If instead of bound (60), bound (61) is substituted, the numerator can only decrease and the denominator only increase; therefore, If both (60) and (61) are substituted, then the value of the expression must fall within the limits defined by (63) and (64). Therefore, the bounds on our throughput error are

5. Using the PM Bounds

The bounds derived in the previous sections are actually limits on PM correction terms. Assuming we know the and values a priori and that we can say something about the homogeneous arrival assumption, then these bounds can be found without knowledge of all the values for each . This is the same simplification of data collection that we have in using the OA formulas instead of direct calculations.

As an example of using the bounds in a simulation study, assume we are interested in finding an estimate for average number at a device. A series of 10 runs is made and the bound for the correction term, , is calculated. If it is positive, we can call this an upper bound. If it is negative, let this be a lower bound. In both cases the other bound is 0. Assume in the simulation runs these bounds always fall between −6 and 3.3. The correction term for each run is approximated by taking the average value between the upper and lower bounds. We would like to be able to say something about the probability that future runs will fall within the limits that have already appeared. We can do this using tolerance limit calculations.

Assume the 10 runs produced the results given in Table 2. The average and standard deviation for these observations are −1.34 and 1.55, respectively.

Since the tolerance limits are going to be set at the observed limits, we can say From tolerance limit tables [28] we can say with 95% confidence that at least 91% of future observations of will fall within the interval . That is, if we use values, we have 95% confidence that the correct values for each run will fall within these limits 91% of the time.

6. Conclusion

In this paper, bounds were developed for the operational analysis error measures of homogeneous service and arrival assumptions. These bounds allow us to take advantage of the simplified data collection made possible by the use of operational analysis relationships, even when the assumptions used to derive those relationships are violated. A tolerance limit based method was given in order to be able to say something about the confidence that future correction term values in a time series would be within certain limits.

Abbreviations

Total number of job completions at a device
:Total number of completions when n jobs are at a device
:Correction term makes the OA term equal to the directly calculated performance measure
:The set of states
:The set of states
:Tolerance limit multiplier
:The maximum number of jobs seen at a device, both in queue and in service
:State of a device, number of jobs both in queue and in service
:The median state seen by a device
:The median state at a device at job completions
:Average number of jobs at a device
:The average number of jobs at a device assuming flow balance, homogeneous arrivals, and homogeneous services
:The average number at the device seen by a completing job
:The average of the set of states seen by completing job truncated at the median state value
:The average state at a device, excluding the maximum state
:Time-average of the set of states truncated at the median
:Average number of jobs at a device seen by an arriver, assuming homogeneous service
:Proportion of time spent in state
The proportion of arrivals when jobs are at a device,
The proportion of completions that leave jobs at a device,
:Average response time
:Average response time assuming homogeneous services
:Average response time assuming homogeneous arrivals and homogeneous services
:Mean time between completions during busy periods
:Standard deviation
:Total time of the period of observation
:Total time device was in state
:Utilization, 1−(0)
:Throughput
:Throughput assuming homogeneous arrivals and homogeneous services
:Maximum error in homogeneous arrival assumption error measure
:Maximum error in homogeneous service assumption error measure
:A bound on the homogeneous arrival assumption error measure,
:A bound on the homogeneous service assumption error measure, .

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.