Abstract

The basic formula to calculate sample variance is based on the sum of squared differences from mean. From computational perspective, mean calculation is nondesired as it can introduce computing errors. Previous researches have proposed to use weighted formula of the successive differences to calculate sample variance to avoid mean calculation. But their weighted formula is not in a unified format in the sense that it has to be represented as two formulas. This paper proposes a unified weight formula for sample variance calculation from weighted successive differences. A proof is provided to show that sample variance calculated using the proposed unified weighted formula is mathematically equivalent to the basic definition.

1. Introduction

Sample variance calculation is a fundamental task in many data analysis applications. The basic formula for calculating a sample variance is based on the sum of squared differences from mean. Given that a set of data is , the sample variance, denoted as , is calculated as follows: where is the sum of squared differences from the mean and is the sample mean. Von Neumann et al. [1] pointed out that (1) does not take into account the order of the observations. They proposed to instead use successive differences of data so that the order can be considered. Specifically, they used where the subscript refers to temporal order of the data and . Define as the successive differences of the input data. From computational perspective, von Neumann’s formula is also advantageous as it avoids a mean calculation that may introduce rounding errors.

The problem with is that it is not mathematically equivalent to the basic definition. This problem was independently solved by Eilon and Chowdhury [2] and Joarder [3] where weighted successive differences were used to derive a formula that is mathematically equivalent to the basic definition.

Eilon and Chowdhury [2] considered a job scheduling problem where they wanted to minimize the variance of the job’s waiting time. Let be the waiting time of the th job. By definition, , as the first job does not have waiting time, and , for , where is the number of jobs and is the processing time of job . The objective is to minimize the variance of the waiting time, or equivalently . For this purpose, there is a need to quickly update when job and are swapped. Notice that when job and are swapped, most of the jobs’ waiting time will change accordingly, and and have to be recalculated. To avoid recalculating when updating , Eilon and Chowdhury derived a formula to calculate from successive differences. By definition, the successive differences of the waiting time are the processing time; that is, , . So, where Equation (3) is not a general formula for calculating as is zero. Vani and Raghavachari [4] gave a more general formula by considering the job’s completion time rather than waiting time. Let be the completion time of job . They rewrote (3) as follows: where for . In an independent work, Joarder [3] also derived a formula similar to (5). He then converted its double sum structure into a quadratic form wherein where is a vector of the successive differences and is a weight matrix with as defined in (6).

One problem with the weight formula in (6) is that it is not in a unified format but has to be represented as two formulas. This deficiency prohibits a compact representation that would facilitate further derivations. To solve this problem, we derive a unified weight formula for sample variance calculations from weighted successive differences. Joarder [3] derived an updating formula to calculate a variance from weighted successive differences. But, his formula contains a dynamically increased number of updating items. Using the unified weight formula, we show [5] that we can improve Joarder’s formula by reducing the updating items to a fixed number of only two items.

2. Main Results

Theorem 1. Given that a temporally order of the observations the sum of squared differences about the mean can be represented as where , , for , , and are the symmetric matrix with

Proof. First write for . Now, can be presented as for that is, the row column element of is for .
Next, the mean of can be written as In vector form this is where the row column element of is Now observe that Thus we need to obtain expressions for calculating , , , and .
First where Then where That is, Finally, where We now can see that and, hence, A direct calculation produces as follows: Thus, and the proof is complete.

3. Numerical Example

This section gives a numerical example to illustrate sample variance calculation using the nonunified weight formula given in (6) and the unified formula given in (9). We take a sample data set , , , and from Ross [6, Page 145] where the data are used to illustrate the variance updating process using the one-pass algorithm proposed in van Reeken [7]. The successive differences for this data set are and the successive differences vector is . Using the nonunified weight formula, the weight matrix is constructed using two formulas: one for the lower triangular matrix and one for the strictly upper triangular matrix. For the lower triangular matrix with , . For example, , , , and . For the strictly upper triangular matrix with , the weight formula is . For example, , , and . Combining the lower triangular matrix and the strictly upper triangular matrix we can get The variance is then calculated as

Now with our approach, the weight matrix is constructed using the unified weight formula given in (9). For example, and . Similarly, and are calculated as and . The other weights are calculated in a similar manner to produce The variance is then calculated as

4. Conclusions

Sample variance calculation using weighted successive differences is advantageous from a computational perspective as it avoids a mean calculation which may introduce rounding errors. However, the weight formula that has been proposed in previous research is not in a unified format. Instead, it has to be represented as two formulas. This deficiency prohibits compact representation of further derivations. This paper derives a unified weight formula for calculating a sample variance from weighted successive differences. We have employed this compute formula to improve variance updating formula in Vani and Raghavachari [4] or Joarder [3].

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.