#### Abstract

The repeatability, reproducibility, and sources of error inherent in a given measurement are important considerations for potential users. To quantify errors arising from a single operator or multiple laboratories, most testing standards uses a one-way analysis of variance- (ANOVA-) based method, which utilizes a simple standard deviation across all measurements. However, this method does not allow users to quantify the sources of error and capacity (i.e., the precision to tolerance ratio). In this study, an innovative two-way ANOVA-based analysis method is selected to quantify the relative contributions of different sources of error and determine whether a measurement can be used to check conformance of a measured characteristic to engineering specifications. In this study, the standardized Atterberg limits tests, fall-cone device Atterberg limits tests, and bar linear shrinkage tests widely used for determining the soil plasticity were selected for evaluation and demonstration. Comparisons between results of the various testing methods are presented, and the error sources contributing to the overall variations between tests are discussed. Based on the findings of this study, the authors suggest use of two-way ANOVA-based R&R analysis to quantify the sources of measurement error and capacity and also recommend using the fall cone device and ASTM standardized thread rolling device for determining liquid and plastic limits of soils, respectively.

#### 1. Introduction

The repeatability (i.e., the single operator or intralaboratory precision) and reproducibility (the interlaboratory precision) of a measurement are important characteristics which can be quantified to enable users to understand the variability of test results. The ASTM E691 standard practice on interlaboratory testing states “ASTM standard regulations require precision statements in all test methods in terms of repeatability and reproducibility” and specifies a one-way ANOVA (i.e., a simple standard deviation across all measurements) to quantify the single-operator or multilaboratory errors [1]. However, the repeatability and reproducibility (R&R) statistics from the one-way ANOVA analysis cannot quantify the contributions of multiple error sources to the overall variation in a measurement. Such information would be useful for identifying potential ways to further improve the test methods. For example, the design of the testing device may need to be improved or training of operators may need to be enhanced. The capacity of a measurement, defined as the ratio of precision to tolerance, is also an important parameter for determining whether a measurement is useful for checking conformance of a measured characteristic to engineering specifications. The ASTM standards (i.e., ASTM E177) use the 95% limits on the difference between two test results, referred to as the “d2s” limit (i.e., , where 1s is one standard deviation) to determine the acceptable range of two test results [2]. However, for a given testing method, the acceptable ranges calculated using the method are usually different between the single-operator and multilaboratory test results due to their different standard deviations. In recent years, more and more new laboratory and field testing devices and methods have been developed for evaluating soil characteristics [3–5]. However, very few of them report the repeatability, reproducibility, and capacity of newly developed testing methods, which considers errors arising from both the device and different operators.

To address these issues, the authors propose to use a two-way ANOVA-based R&R analysis to evaluate the repeatability, reproducibility, sources of error, and capacity of a measurement. In this study, several widely used Atterberg limits testing devices are employed for comparisons and demonstration of the statistical analysis. The ASTM-standardized Atterberg limits tests [6], fall cone test [7], and bar linear shrinkage test [8] were conducted on specimens prepared by incorporating different percentages of pure bentonite into the minus No. 40 fraction of crushed limestone samples from granular roadways. The R&R and capacity of the testing methods were determined using the two-way ANOVA-based analysis and compared to those determined using the one-way ANOVA methods described in ASTM E691-15; ASTM E177-13 [1, 2]. Based on the results of the laboratory tests and statistical analyses, the method for quantifying repeatability, reproducibility, sources of error, and capacity of a given measurement is demonstrated, the testing methods with the best R&R are identified, and the correlations between the different tests are also discussed.

#### 2. Various Tests for Determining the Soil Consistency

Swedish soil scientist Atterberg defined moisture content limits to delineate transitions in the consistency of fine-grained soils [9]. Atterberg initially set up five limits to describe the consistency of a soil at different water contents: (1) the upper limit of fluidity, (2) the lower limit of fluidity (flow limit), (3) the sticky limit, (4) the roll-out limit, and (5) the cohesion limit. Based on his laboratory evaluations, Atterberg established that a soil is plastic between the flow limit (liquid limit) and roll-out limit (plastic limit) and the plasticity number (plasticity index), which is the difference between the flow and roll-out limits, is the best measure of the plasticity of soils [10].

Since Terzaghi introduced Atterberg’s limits into modern soil mechanics practice and Casagrande standardized the testing devices, the liquid and plastic limit tests have been extensively performed in geotechnical engineering and soil science fields worldwide [11, 12]. To date, Atterberg limits remain a requirement for most soil classification systems, and they are used in many empirical models for predicting soil engineering properties and specifications for control material properties [12–14]. However, many previous studies have demonstrated that the conventional Atterberg limits tests are highly operator dependent and thus produce significant variations in the test results [11, 15–18]. To provide more repeatable and reproducible test results, different devices and testing methods including the fall cone and bar linear shrinkage tests have been proposed and evaluated [8, 17, 18].

##### 2.1. Liquid Limit Test

In 1932, Casagrande developed a device to standardize the liquid limit (LL) test and in 1949 further refined the design to overcome inherent shortcomings [19]. The later design of the device is standardized in the current ASTM D4318-10 [6]. Although the Casagrande device has become ubiquitous in geotechnical testing, many previous studies have demonstrated that the device yields large variations in LL values. Some of the factors responsible for the large variation are a strong dependency of the results on operator judgment, wear of the grooving tool, and variations in hardness of the base materials of different devices [15–18]. Since late 1950s, many studies have focused on alternative LL measurement methods, and several have concluded that the fall cone device originally developed for testing bitumen materials can eliminate most of the shortcomings of the Casagrande device and provide more consistent test results [17, 18]. Sowers et al. [18] evaluated the effects of cone angle, cone mass, and penetration time on the test results and concluded that the fall cone test is a promising method for measurement of LL. Haigh [16] concluded that the Casagrande cup and fall cone devices actually measure different physical properties of soils and reported that the fall cone test is a measure of specific strength which corresponds to a soil shear strength of ∼1.7 kPa, but the Casagrande cup test corresponds to a mean specific strength of ∼1.07 m^{2}/s^{2}. However, many studies have reported strong correlations between LL values determined by the Casagrande and fall cone test devices for a range of material types [15, 17, 20–25].

##### 2.2. Plastic Limit Test

The fall cone test device was also evaluated for determination of the plastic limit (PL) of soils by [11, 26]. The data interpretations used in these studies were developed based on three assumptions: (1) the undrained shear strength (*C*_{u}) of a soil at its PL is approximately 100 times that at its LL [27]; (2) the relationship between moisture content () and is linear based on critical state soil mechanics concepts [11, 28]; (3) is constant for the same cone geometry, where *d* is the fall cone penetration depth and *W* is the weight of the fall cone [11, 28]. Based on these three assumptions, Wroth and Wood [11] proposed to determine the plasticity index (PI) of a soil by conducting fall cone tests with two different cone weights ( and ) to determine the water content separation () of the two parallel flow lines as shown in Figure 1, from which the PI of the specimen can be calculated using the equation shown in the same figure.

##### 2.3. Bar Linear Shrinkage Test

Paige-Green and Ventura concluded that the bar linear shrinkage (BLS) test result is a good indicator of the plasticity of soils [8]. The BLS specimen is prepared by mixing minus No. 40 material at a water content close to its LL, then transferring the material to a 150 mm long by 10 mm square trough and oven drying at 110°C until shrinkage stops. The shrinkage of the specimen is then measured and expressed as a percentage of the original specimen length, which is defined as the BLS value. Paige-Green and Ventura observed a linear correlation between BLS values and PI, with the PI values approximately two times the corresponding BLS values [8]. Paige-Green and Ventura [8] found that the BLS test is less susceptible to operator error and is much quicker and easier to learn and perform than the conventional Atterberg limits tests. To improve repeatability and reduce uneven shrinkage leading to bowing (bending) of the specimens, Sampson et al. recommended using a mold with openings on two sides instead of a trough and placing the specimen into the oven immediately after filling to reduce cracking [29]. Similar bar shrinkage tests can be found in several different testing standards, which are summarized in Table 1 along with their various trough dimensions, drying methods, and oven temperatures.

#### 3. Materials and Testing Methods

##### 3.1. Materials

In this study, a total of five samples were prepared by incorporating different percentages of pure bentonite powder into the minus No. 40 fraction of existing crushed limestone fines. The incorporated bentonite content by dry mass of the minus No. 40 material was increased from 0% to 12% in 3% increments. The sieve analysis, hydrometer analysis, and Atterberg limits test results of the initial full granular-surface material gradation are shown in Figure 2.

The chemical composition and mineralogy of the bentonite were determined using X-ray fluorescence (XRF) X-ray diffraction (XRD) analyses, respectively. The XRD results showed that the bentonite was sodium montmorillonite (Na_{0.3}(Al,Mg)_{2}Si_{4}O_{10}(OH)_{2}·4H_{2}O) with calcite (CaCO_{3}) and quartz (SiO_{2}). The chemical composition determined by the XRF results is shown in Table 2, indicating that the primary chemical components are SiO_{2} and Al_{2}O_{3}.

The liquid and plastic limits of the bentonite determined using the methods of ASTM D4318 [6] were 297% and 35%, respectively. Following the recommendation from Bergeson and Wahbeh [30], a 0.5% sodium carbonate (i.e., soda ash) solution was used to increase the water content of the bentonite-treated samples, in order to disperse the bentonite particles and reach a more uniform consistency.

##### 3.2. Testing Methods

The conventional LL and PL tests were performed in accordance with ASTM D4318-10 [6]. The testing devices used in this study are shown in Figures 3(a) and 3(b).

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

The fall cone LL test was conducted in accordance with the British Standard 1377-2 [7] using the fall cone device shown in part (c) of the figure. The cone weights 80 g and has an apex angle of 30°. The LL is determined as the moisture content corresponding to a cone penetration of 20 mm using a best-fit straight line through the data points of moisture content vs. cone penetration, plotted on linear scales.

To determine the PI using the fall cone device, the testing and calculation methods recommended by Wroth and Wood were followed [11]. As illustrated in Figure 1, the PI of the sample can be calculated based on two known cone weights ( and ) and the water content separation () of the two flow lines at a cone penetration of 20 mm. Along with the fall cone LL tests using the 80 g cone, another set of tests were performed on the same samples using a 240 g cone for this purpose.

The BLS test specimens were also prepared during the fall cone LL tests, because the initial water content of the BLS specimens should be close to the LL that results in a cone penetration of 20 mm. As recommended by Sampson et al., the aluminum BLS molds custom-fabricated for this study are open on two sides, with a length of 150 mm and a 10 mm by 10 mm square cross section (Figures 3(d) and 3(e)) [29]. The molds were first oven-heated at 110°C, then lubricated using a wax bar to reduce friction between the inside walls of the mold and soil specimen, which helps eliminate cracking and uneven or incomplete shrinkage. The wax-lined molds were filled with the soil specimens at moisture contents close to their LL and immediately placed in the oven at 110°C. After drying for 24 hours, the lengths of the specimens were measured using calipers. If a specimen was bowed, the arc height and chord length of the specimen were measured to calculate the average specimen length.

In this study, a total of five samples were prepared at the same time to minimize possible variations caused by sample preparation. For each of the samples, three well-trained operators performed three replicate tests each. The three operators were all trained on all the different tests at the same time in order to minimize errors associated with the interoperator variability.

#### 4. Correlations and Variations between the Various Consistency Tests

##### 4.1. Liquid Limit by Fall Cone vs. Casagrande Cup

The correlation between the liquid limits determined using the Casagrande cup (LL_{cup}) and the fall cone (LL_{cone}) was determined using a total of 45 tests for each device (five bentonite contents times three operators times three replicates per operator). For each bentonite content, the average LL values from the replicate tests are shown in Figure 4, with error bars indicating the maximum and minimum values.

A strong linear correlation can be observed between the two testing methods. The best-fit line is very close to the 1 : 1 line, but on average, the fall cone test yields higher LL_{cone} values for LL_{cup} values below 33, and lower LL_{cone} values above LL_{cup} equals 33. Both tests yield progressively larger variations with the increasing LL values that result from increasing the bentonite content. However, the variations in the fall cone test results are much smaller than those of the Casagrande cup, as clearly demonstrated by the smaller range of the vertical error bars compared to the horizontal ones. The standard deviation of the test results determined using the two test methods are summarized in Table 3.

The linear correlations determined in the previous and present studies for different types of materials are summarized in Table 4. These linear correlations indicate that using the fall cone test to determine LL is promising, and the test results are very close to those of the conventional Casagrande cup test despite the different mechanisms of the two testing methods.

##### 4.2. Plastic Limit and Plasticity Index by Fall Cone vs. Conventional Method

Fall cone tests were also performed on the five bentonite-treated samples to determine the PI and thereby the PL, using the previously described method of Wroth and Wood [11]. The relationships between the PI determined using the fall cone (PI_{cone}) and the ASTM-standardized testing methods (PI_{ASTM}) involving rolling specimens into 3.2 mm diameter threads are shown in Figure 5(a).

**(a)**

**(b)**

A linear correlation can be observed between the two testing methods, but the PI_{cone} values are approximately 40% greater than those determined by the ASTM test method using the ASTM plastic roller device. For the fall cone device, the plastic limit (PL_{cone}) was calculated by subtracting the PI_{cone} values from LL_{cone} values. The resulting plastic limits are compared with those from the conventional ASTM-standardized rolling device in Figure 5(b). Interestingly, the PL_{roller} values are approximately the same for the samples treated with different percentages of bentonite (from 0% to 12%), whereas the PL_{cone} values vary over a much wider range. This phenomenon may indicate that the PL determined using the conventional method is governed by the dominant material of the samples (i.e., the minus No. 40 sieved granular limestone material), and the fall cone test is more sensitive to the bentonite content. This observation warrants further study.

##### 4.3. Plasticity Index vs. Bar Linear Shrinkage Values

In this study, the BLS test was also conducted on the five samples with bentonite contents varying from 0 to 12%. The BLS test results compared with the PI values determined using the ASTM standard tests are shown in Figure 6. A linear correlation can be observed between the two parameters, and the variation of the BLS test results generally increases as the bentonite content increases.

However, as the bentonite content increases from 0% to 12%, the PI determined by the ASTM methods varies from 0 to 28%, whereas the BLS values vary over a much smaller range of 2% to 8%. This indicates that BLS values are much less sensitive than PI values to changes in plasticity. More importantly, the ranges of maximum and minimum values of PI (vertical error bars) for the different bentonite contents do not overlap, whereas most of the BLS ranges (horizontal error bars) do overlap. This means that a BLS measurement on the high end of the range for a bentonite content of 3%, for example, could have the same value as the BLS measurement on the low end of the range for a bentonite content of 12%. In both cases, plugging in the BLS value into the linear equation for converting BLS to PI in Figure 6 would result in a significant error in the estimated PI. For this reason, use of BLS as an alternative test method to directly obtain PI instead of measuring LL and PL separately is not recommended in this study.

#### 5. Two-Way ANOVA-Based Repeatability and Reproducibility Analysis

##### 5.1. Description of the Statistical Analysis Method

In this study, a two-way ANOVA-based R&R analysis was used to statistically quantify the repeatability, reproducibility, overall variability, and error sources of the various laboratory plasticity tests. This statistical analysis method is detailed in Vardeman and Jobe [31]. The input data for the analysis require *J* different operators to measure each of *I* different parts a total of *m* times. The two-way random effects model is represented bywhere *y*_{ijk} is the *k*^{th} measurement made by operator *j* on part *i*, is a measurement averaged over all possible operators and all possible parts, represents the random effects of different parts, represents the random effects of different operators, represents the random joint effects specific to combinations of particular parts and operators, and is random measurement error.

The corresponding variances (, , , and ) of the parameters in the model, called “variance components,” govern the variability of the measurements.

According to the random effects model, the only difference between different measurements for a specific combination of part and operator is the measurement error (), so its standard deviation () is a measure of the repeatability in the model:

For a fixed part “*i*,” the value is constant for different measurements, so the measure of operator bias for a fixed part, i.e.,, is an appropriate measure of reproducibility, which can be expressed as

Therefore, the overall variation due to repeatability and reproducibility () can be calculated as

To obtain the parameters used in this model, a two-way ANOVA table such as Table 5 can be determined based on the test data using a statistical software package.

The number of parts (*I*) and number of operators (*J*) should be set as nominal variables for the two-way ANOVA analysis. The three standard deviations can then be calculated as follows:

The degrees of freedom of the three quantities can be approximately determined using the Satterthwaite method [32] as

The corresponding confidence limits for each of the quantities can be calculated based on the Chi-squared distribution () using

The contributions of and to are quantified using

##### 5.2. Results of Repeatability and Reproducibility Analysis

The two-way ANOVA-based R&R analysis was conducted on the results of the various laboratory plasticity tests detailed in the preceding sections. The testing matrix used for the R&R analysis is shown in Table 6. For each of the test methods, each operator conducted three replicate tests on the five samples. Hence, the data collected for analysis of each test method are from three different operators (*J*) measuring each of the five different parts (*I*) a total of three times (*m*).

The results of the analyses are summarized and compared to the R&R reported in ASTM D4318 in Table 7. For the Casagrande cup LL test, the R&R determined using the two-way ANOVA-based method presented herein (0.6% and 1.7%) are close to those reported in ASTM D4318 (0.5% and 1.3%).

The two-way ANOVA-based analysis results show that the overall variation () of the fall cone LL tests is 0.7%, which is less than half that of the Casagrande cup test (1.8%). The analysis results also can identify the sources of error inherent in the test methods. For the fall cone test, the fraction of the due to the (i.e., between-operator error) is 50%. However, for the Casagrande cup test, 89% of the overall is contributed by the between-operator errors, even though all three operators were trained at the same time. Based on the two-way ANOVA analysis results, it can be concluded that the fall cone test used for measuring the LL is more consistent and less operator dependent than the Casagrande cup test. For the PL test conducted using the ASTM rolling device, the determined using the two-way ANOVA-based method is 0.7%, which is close to the multilaboratory value reported in ASTM D4318. The between-operator error of the PL test is still the main source of the overall variation (73%), which is expected because the testing method is somewhat subjective. The use of the ASTM PL rolling device produces more consistent 3.2 mm-diameter threads compared to rolling by hand, which will improve both the repeatability and reproducibility of the PL test results. However, the R&R analysis was not specifically performed on the hand rolling method in this study.

For the BLS test, is 1%, and the between-operator error accounts for 57% of the overall variation. As discussed in the previous section on correlations, however, the conventional PL and BLS test results were not sensitive to the bentonite content of the mixtures. Therefore, the five different samples (parts) prepared in this study could be regarded as nearly the same for these two tests, which may result in favorably smaller values.

##### 5.3. Measurement Capacity Ratio

The ASTM standards typically use the d2s limit (i.e., ) to determine the acceptable range of two test results (ASTM E691 and E177), which are calculated based on either the single-operator or multilaboratory standard deviations, s. Based on the two-way ANOVA-based R&R analysis results, the measurement capacity ratio (MCR), which is the precision-to-tolerance ratio of a measurement, can be used to quantify the errors from both the testing device and multiple operators.

The MCR can be used to determine whether a measurement is suitable for verifying the conformance of a measured characteristic to engineering specifications. The MCR can also be considered when setting specification ranges based on measurements. For example, if the lower (*L*) and upper (*U*) boundaries of a specification for the LL of a material are 30% and 45%, and the of the fall cone LL device is 0.7%, the MCR of the device can be calculated using Equation (9), which gives a MCR value of 0.28.

According to Vardeman and Jobe [31], “It is common to treat some multiple of (often the multiplier is six, but sometimes 5.15 is used) as a kind of uncertainty associated with a measurement made using the gauge or measurement system in question,” and “*The hope is that measurement uncertainty is at least an order of magnitude smaller than the spread in specifications*,” which requires that the MCR should be no larger than 0.1 in order to use the measurements to check conformance to such specifications. However, this target MCR value of 0.1 may be too strict for geotechnical applications and needs to be reevaluated for different materials and testing methods.

#### 6. Conclusions and Recommendations

In this study, several laboratory soil plasticity tests were selected for demonstration of the use of a two-way ANOVA-based R&R analysis to evaluate the repeatability, reproducibility, overall variation, source of error, and capacity of a given measurement. Such an analysis can provide useful suggestions for improvement of a testing method. The measurement capacity ratio (MCR) was also demonstrated, which considers errors from both the device and the interoperator variability, and it should therefore be considered when selecting QC/QA testing methods. Based on the findings of this study, the authors suggest using the two-way ANOVA-based analysis presented herein to determine the R&R and identify the sources of measurement error, and considering the MCR of a measurement when setting specifications or selecting QA/QC testing methods. Based on the laboratory testing results, some other key findings about correlations between the various testing methods are listed below:(i)Correlations between the fall cone and Casagrande cup tests determined in the present and previous studies demonstrated that the fall cone test can be used to determine LL of a material with reduced variability between repeated tests. The two-way ANOVA-based repeatability and reproducibility analysis also revealed that the fall cone test can result in smaller overall variation than the Casagrande cup test, which is more prone to between-operator errors.(ii)For measuring the PL and PI, the fall cone test and conventional test method using the ASTM plastic roller yielded significant discrepancies for the abraded crushed limestone granular materials with small percentages of bentonite incorporated. The fall cone test showed a dependence of PL on the bentonite content, whereas the conventional method was practically insensitive to the bentonite content. Further studies need to be conducted to evaluate the influence of the different testing mechanisms and whether PL is governed by the dominant minerals of a soil mixture.(iii)The bar linear shrinkage results exhibited a linear correlation with the PI determined by conventional ASTM testing methods. However, as the PI increased significantly from 0 to 28% by incorporating bentonite, the corresponding BLS values were much less sensitive, exhibiting a change of only 6%. Moreover, the ranges of measured BLS values for the different bentonite contents overlapped, prohibiting a reasonably accurate correlation between BLS and PI.

#### Nomenclature

LL_{cone}: | Liquid limit by fall cone tests |

LL_{cup}: | Liquid limit by ASTM standardized Casagrande cup tests |

MCR: | Measurement capacity ratio |

PIASTM: | Plasticity index by ASTM standardized tests |

PI_{cone}: | Plasticity index by fall cone tests |

PL_{cone}: | Plastic limit calculated based on fall cone test results |

s: | Standard deviation |

: | Repeatability standard deviation |

: | Reproducibility (between-operators) standard deviation |

: | Combined R&R standard deviation. |

#### Data Availability

The data used to support the findings of this study are available from the first author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

The authors would like to thank the Iowa Department of Transportation for sponsoring this study. The financial supports provided by Natural Science Foundation of Shaanxi Province in China (grant no. 2019JQ-498), the Science and Technology Projects of Gansu Transportation Department (grant nos. 2019-16 and 2019-17), and Opening Foundation of Research and Development Center of Transport Industry of Technologies, Materials and Equipments of Highway Construction and Maintenance (Gansu Road and Bridge Construction Group) (grant nos. GLKF201804 and GLKF201807), are greatly appreciated.