#### Abstract

Coefficient alpha is the most commonly used internal consistency reliability coefficient. Alpha is the mean of all possible -split alphas if the items are divided into parts of equal size. This result gives proper interpretations of alpha: interpretations that also hold if (some of) its assumptions are not valid. Here we consider the cases where the items cannot be split into parts of equal size. It is shown that if a -split is made such that the items are divided as evenly as possible, the difference between alpha and the mean of all possible -split alphas can be made arbitrarily small by increasing the number of items.

#### 1. Introduction

In test theory, test scores are used to summarize the performance of participants on a test. An important concept of a test score is its reliability, which concerns the precision of the administration of a participant’s score. Reliability can be conceptualized in different ways. In layman’s terms, a test score is said to be reliable if it produces similar outcomes for participants under similar administration conditions. A more formal definition of reliability comes from classical test theory; namely, reliability is defined as the ratio of the true score variance and the total score variance [1–3]. The true score variance cannot be observed directly. Therefore, it has to be estimated from the data. If there is only one test administration reliability can be estimated using the so-called internal consistency reliability coefficients [4]. The internal consistency coefficient that is most commonly used in psychology and other behavioral sciences is coefficient alpha [5–10].

Coefficient alpha was already discussed in Guttman [11]. The coefficient was later popularized by Cronbach [6]. Since then it has been applied in thousands of research studies [5, 9]. It has been pointed out that alpha is not a measure of the one-dimensionality of a test score [6, 9, 12]. Furthermore, a problem is that the value of alpha is affected by the length of the test. Alpha is often high if the test consists of 15 items or more [5, 13, 14]. Moreover, the derivation of alpha is based on several assumptions from classical test theory. For example, alpha is an estimate of the reliability if the items of the test are essentially tau-equivalent [1, 2]. These assumptions may not hold in practice. Various authors have therefore studied alpha’s robustness to violations of its assumptions [12, 15–17].

Other authors have presented results that give a meaning to alpha and to its estimate from a sample in case its assumptions do not hold [6, 18]. The items of a test can be split into two parts. This will be called a 2-split. For each part, the sum score of the items can be determined. If we apply Cronbach’s alpha to the two sum scores, we obtain the Flanagan-Rulon split-half estimate of the reliability [19, 20]. The split-half reliability is an alternative approach of estimating the reliability of a test when there is only one administration [3, 7, 8]. The problem with a 2-split of a test is that there are multiple ways to divide the items into two parts. The value of the 2-split alpha therefore depends on the way the 2-splits are made [7]. Cronbach [6] showed that if the 2-splits are made such that the parts have equal size, then the overall alpha is the mean of all possible 2-split alphas. Cronbach’s result was generalized by Raju [18] to other types of splits. Let be an integer. Raju [18] showed that if the -split is such that the parts have the same number of items, then alpha is the mean of all possible -split alphas.

The algebraic results by Cronbach [6] and Raju [18] are important since they provide a proper interpretation of alpha and of its estimate from a sample in the case that its assumptions do not hold [2, 5]. Thus, if the number of items is a composite number , then there exists an interpretation of alpha for any proper divisor of . Raju [18] also showed that if the parts do not have equal size, then the mean of all -split alphas is strictly smaller than the overall alpha. Moreover, this difference can be substantial [18]. Thus, at present, it is unknown if there is an (approximate) interpretation of alpha as a mean of -split alphas in case the number of items is a prime number. This is the topic of investigation of this paper.

The paper is organized as follows. Notation and Definitions are presented in Section 2. We also present a formula of the mean of all possible -split alphas in Section 2. In Section 3, the nonnegative difference between alpha and the mean of all -split alphas is studied. It is shown that if we divide the items as evenly as possible, the difference is strictly decreasing in the number of items. Furthermore, the difference goes to zero if the number of items increases. Hence, the difference can be made arbitrarily small by increasing the number of items. In Section 4, we determine for three criteria and for small values of how many items are needed. Finally, Section 5 contains a conclusion.

#### 2. Notation and Definitions

Suppose we have a test that consists of items. Let for , denote the covariance between items and , and let denote the variance of the total score. Guttman’s lambda3 or Cronbach’s alpha is defined as In the summation both counters and run from 1 to , but they are never equal.

Next, suppose we split the test into parts of sizes with for and Let for denote the proportion of items in part . Furthermore, let denote the covariance between the sum scores of parts and . For a -split coefficient alpha is defined as [18] We have . Furthermore, for fixed and , the number of different -splits is given by [21, page 823] For each -split, there is an associated . The mean of all -split alphas is denoted by . An expression for is presented in Lemma 1. The result is not derived in Raju [18] and is believed to be new.

Lemma 1. *It holds that
*

*Proof. *The variance of the total score is not affected by the -split. Furthermore, is fixed since is fixed. Hence,
where, using (4),
We want to determine how often each covariance between items and occurs in the summation on the right-hand side of (7). Raju [18, page 559] showed that this amount is given by
Since each covariance occurs twice in the summation , (7) is given by
Using (9) in (6), we obtain (5).

#### 3. Difference between and

In this section, we study the difference between alpha and the mean of all -split alphas. Using (1), we can write (5) as Cronbach [6] showed that if . A proof can also be found in Lord and Novick [1]. Raju [18, page 560] showed that if and that if are not all equal. Hence, the difference is nonnegative. Difference (11) can be used to study how close the values of alpha and the mean of all -split alphas are. Since , we have the inequality The right-hand side of (12) is an upper bound of for all values of (). It only depends on and the proportions of items in the parts.

It turns out that difference (11) can be quite large [18]. This is the case when the variance between is large, that is, if are very different. To illustrate this property of difference (11), suppose we have items, where is an integer. Furthermore, suppose we distribute the items such that we have parts with 1 item and 1 part with items. The proportions of items per part are given by Using the proportions in (13) we have Due to the -term in the denominator of the right-hand side of (14) we have, for fixed , that as .

In this paper, we are not interested in an arbitrary -split of the items but in a -split in which the items are distributed as evenly as possible over the parts. Suppose we have items where , are integers with . If we want to distribute the items as evenly as possible over the parts, an optimal -split is to have parts with items and parts with items. In this case the proportions of items per part are given by Lemma 2 shows that for a split of the type in (15) the quantity approaches the ratio as the counter goes to infinity.

Lemma 2. *It holds that
*

*Proof. *Using the proportions in (15), we have the identity
Using the identity
we can write (17) as
For , we have

It follows from Lemma 2 that the right-hand-side of (12), and thus the difference , goes to zero as increases. In other words, if the -split is of the type in (15), the difference between alpha and the mean of the best -splits vanishes as the number of items becomes large. It is not clear from Lemma 2 that is strictly increasing in . This is shown in Lemma 3.

Lemma 3. *For split (15) is strictly increasing in .*

*Proof. *Using (19), the first order partial derivative is
Since , (21) is strictly positive, and it follows that is strictly increasing in .

#### 4. Numerical Results

Lemmas 2 and 3 from the previous section show that the nonnegative difference decreases strictly to zero as increases, if the -split is of the type in (15). Hence, for sufficiently large , the difference can be made smaller than the specified criteria. In this section, we determine the numbers of items that are needed so that the difference is less than 0.010, 0.005, and 0.001. Since a difference of 0.010 is negligible for most practical purposes, we may say that is approximately equal to if the difference is less than 0.010, that is, if The values 0.005 and 0.001 can be used if one wants a stronger definition of approximately equal.

If we are interested in a -split but the number of items is not a multiple of , there are different scenarios to consider. For example, if but the number of items is not a multiple of 3, we either have items or have items. Inequality (22) may hold for different depending on whether we have items or items. We therefore consider the number of items of all variants of a -split separately. For all possibilities of and with and , we applied the following numerical strategy. For fixed and , we increased in steps of 1, starting at 1, until inequality (22) was satisfied. The results for are presented in Table 1. The table presents for each variant of a -split the minimum number of items for which the difference is ≤0.010 (column 3), ≤0.005 (column 4), and ≤0.001 (column 5).

Table 2 summarizes for each -split the minimum number of items required per criteria. For example, consider the case . If the number of items is even we have a perfect 2-split and . If the number of items is odd and we distribute the items as evenly as possible, Table 1 shows that the difference is no more than 0.010 if we have at least 11 items, no more than 0.005 if we have at least 15 items, and no more than 0.001 if we have at least 33 items. Hence, we may conclude that in the case that we distribute the items as evenly as possible the difference is no more than 0.010 if we have at least 10 items, no more than 0.005 if we have at least 14 items, and no more than 0.001 if we have at least 32 items. These numbers are presented in Table 2.

As another example, consider the case . If we have items, we have a perfect 3-split and . If we have items, Table 1 shows that the difference is no more than 0.010 if we have at least 10 items, no more than 0.005 if we have at least 16 items, and no more than 0.001 if we have at least 34 items. If we have items, Table 1 shows that the difference is no more than 0.010 if we have at least 11 items, no more than 0.005 if we have at least 17 items, and no more than 0.001 if we have at least 32 items. Hence, we may conclude that in the case that we distribute the items as evenly as possible, the difference is no more than 0.010 if we have at least 9 items, no more than 0.005 if we have at least 15 items, and no more than 0.001 if we have at least 32 items. These numbers are presented in Table 2.

#### 5. Conclusion

Alpha is the most commonly used coefficient for estimating reliability of a test score if there is only one test administration [5–9]. Cronbach [6] and Raju [18] have presented results that give a meaning to alpha and to its estimate from a sample, even if its assumptions do not hold. Cronbach [6] showed that if the 2-splits are made such that the parts have equal size, then alpha is the mean of all possible 2-split alphas. Raju [18] showed that if a -split for is made such that the parts have the same number of items, then alpha is the mean of all possible -split alphas. The importance of these results lies in the fact that they provide a proper interpretation of alpha if (some of) its assumptions do not hold [2, 5].

In this paper, we consider the cases where the items cannot be split into parts of equal size. In these cases, the value of alpha exceeds the mean of all possible -split alphas [18]. However, it was shown that if a -split is made such that the items are divided as evenly as possible, the difference between alpha and the mean of all possible -split alphas is negligible for sufficiently many items (Lemmas 2 and 3). Using numerical simulations, it was shown, in the case of a 2-split, that the difference is no more than 0.010 if we have at least 10 items, no more than 0.005 if we have at least 14 items, and no more than 0.001 if we have at least 32 items and, in the case of a 3-split, that the difference is no more than 0.010 if we have at least 9 items, no more than 0.005 if we have at least 15 items, and no more than 0.001 if we have at least 32 items.

The results in Cronbach [6] and Raju [18] provide interpretations of alpha if the number of items is a composite number , for any proper divisor of . The results and simulations in this paper provide interpretations of alpha if the number of items is prime.

#### Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgment

This research was done while the author was funded by the Netherlands Organisation for Scientific Research, Veni Project 451-11-026.