Abstract

The newly developed statistical technique of two-way ordinal analysis of variation (ORDANOVA) was applied for the first time to sensory responses in combination with multinomial ordered logistic regression of a response category vs. chemical composition. A corresponding tutorial is provided. As a case study, samples of a sausage from different producers, purchased at the same time from a market, were compared based on sensory responses of experienced experts. A decomposition of total variation of the ordinal data and simulation of the multinomial distribution of the relative frequencies of the responses in different categories showed a statistically significant difference between the producers’ samples, and an insignificant difference between the experts’ responses related to the same sample. The capabilities of experts were also evaluated. The influence of chemical composition of a sausage sample on the probability of a response category was modeled using multinomial ordered logistic regression of the response on mass fractions of the main sausage components. This statistical technique can be helpful for understanding sources of variation of sensory responses on food quality properties. It is also promising for a revision of specification limits for chemical composition, as well as for the prediction of sensory properties when the chemical composition of the product is subject to quality control.

1. Introduction

Despite the development of sophisticated chemical analytical methods and modern equipment, sensory analysis based on human organoleptic examination of food products, such as a sausage, will be always important for the people who are consumers of the products.

The international guidelines for sensory analysis [1] recommend the use of quantitative scales for human (assessor/expert) responses. For example, sensory responses on properties of a sausage quality can be classified by the following five quality categories: (1) very bad; (2) poor; (3) satisfactory; (4) good; and (5) excellent. Analysis of variance (ANOVA) is applied for interpretation of such property values [24]. Other statistical methods developed mainly for quantitative continuous values, including regression analysis, are also applied [5, 6]. An implementation of fuzzy logic for modification of sensory responses to obtain comparable scores [7, 8] has attracted attention. Statistical methods for sensory analysis, in particular for meat and meat products, are regularly reviewed [9, 10].

However, responses on an ordinal property scale are categorical data for which a total ordering relation can be established according to the magnitude, with other quantities of the same kind, but no algebraic operations exist among those quantities [11, 12]. Their legitimate operations can be “equal/unequal” and “greater than/less than” [13]. The addition of categorical data is not a legitimate operation by definition, whereas one of the ANOVA assumptions is that the treated quantities are additive [14] and calculation of their mean and variance, is possible. Therefore, statistical techniques based on the ANOVA should not be applied directly to ordinal data.

Sensory responses may also depend on the chemical composition of a food product [15, 16]. Relationships between sensory evaluation and the chemical composition or physical properties of meat and meat products are a subject of research [17]. Regression analysis is the known tool for studying and modeling such relationships. However, as in the applications of ANOVA, the additivity assumption should not be violated in the regression analysis use [18].

The aim of the present paper is to overcome the “additivity” problem by implementation of the newly developed two-way ordinal analysis of variation (ORDANOVA) [19] combined with multinomial ordered logistic regression [2022]. As a case study, samples of a kind of commercially available sausage from different producers were compared based on the ordinal data from the examination of quality properties by experienced experts. The influence of the chemical composition of a sausage sample on the probability of a response category was modeled.

2. Materials and Methods

2.1. Statistical Approach–A Tutorial
2.1.1. Two-Way ORDANOVA without Replication

The approach of ORDANOVA is to calculate for each property the number of expert responses related to the same category, and then to analyze their relative frequency as a fraction of the total number of the responses for all categories.

A vector of expert responses for a given property as a random variable on an ordinal scale with categories (classes or levels k = 1, 2,…, K) is characterized by a probability vector , where is the theoretical probability of responses related to the kth category and . Let denotes the cumulative theoretical probability up to the kth category, and . The probability P of receiving a set of responses , where is the number of responses related to the kth category , can be calculated based on the multinomial distribution with parameters (N, p) as the probability mass function [23]:

Note that the probability of the event is by definition. Variability in is explained in the present study by two independent factors as follows: sausage producers and experts as random factors. The examination of the samples of sausage was a properly conducted blind test, in which an expert has no opportunity to favor the sausage of a particular producer. No interaction between the two factors can be analyzed, since only one response at the specified levels of the factors (at each cell in the cross-over balanced design) was obtained, by analogy with laboratory proficiency testing [24]. The first factor had I levels (sausage samples of I producers were tested) and the second factor had levels (J experts examined the samples’ quality). There were responses in total for a given property, each of them was of one of the categories of variable Y. Being a result of assessment of a sausage sample of producer i (one of the levels of the first factor , i = 1, 2,…, I), each response was also associated with the expert j (one of the levels of the second factor , j = 1, 2,…, J). Thus, it was assumed that each one of the  = N cells (i, j) contains only one response belonging to one of the categories k. In other words, a response at each cell (i, j) was of one category k chosen by expert j when examined a sample of producer i, and correspondingly, the number of responses of the chosen categories  = 1 and zero for all other categories in the same cell.

Treating N responses as a statistical sample and as a random variable, then, for the chosen k, and zero for all other categories in the same cell. Thus, denotes the sample cumulative frequency of responses of categories l = 1, 2,…, k, i.e., up to the kth category in cell (i, j); is the sample cumulative relative frequency of responses up to the th category at level of factor ; denotes the cumulative relative frequency at level of factor ; denotes the cumulative relative frequency of all responses up to the th category:

The total sample variation of the response variable Y, normalized to the [0, 1] interval, is defined in the two-way ORDANOVA model [19] as follows:

In the model without replication, is partitioned between the (inter) covariation component and the within (intra) residual variation , i.e., . For the present study, the variationcharacterizes the between-producer variation of the responses, while the variationis the within-producer variation.

The individual effects of factors and (producers and experts, respectively) can be evaluated using the decomposition , where

Another decomposition, helpful for comparing the capability of the participating experts (as a group) to identify different categories, consists of evaluating the following parts of related to the responses of categories less than or equal to category k:

For the first category k = 1 of a sausage property by standard [25], the corresponding part is as a response of category less than 1 does not exist. For the second category, is the variation of the expert responses of categories k= 1 and 2 (up to 2) because of the cumulative properties of the relative frequencies in (7). Therefore, the variation of responses of the participating experts for category k= 2 is . For the third category k = 3, the variation is , and so on. A greater value of indicates a weaker capability to identify category . Note that the capability characterizing dispersion of the responses is analogous to measurement reproducibility [12]. When the cumulative relative frequencies in (7) achieve 1, the variation This is always the case for the last category K, but may also be for (K – 1) or even for (K – 2) [26], depending on the form of the cumulative distribution function.

The null hypothesis of homogeneity of the responses states that the probability of classifying the responses as belonging to the -th category does not depend on the levels of the first factor (levels ) nor on those of the second factor (levels ), i.e., for all and . Under this hypothesis, the following relations are applicable:where E(…) denotes the mathematical expectation of the variation in parentheses; are degrees of freedom. The numerator of the last term in equation (8) is equal to the population of the total ordinal variation corresponding to the probability vector .

To check the statistical significance of both the factor effects, the following significance indices (test statistics) have been defined:

Testing the null hypothesis on the effect significance requires knowledge of at least the asymptotical distribution of the index for calculation of the critical values of the indices at a given level of confidence .

A calculator tool for this purpose was proposed for the two-way ORDANOVA in reference [19]. The tool calculates from the empirical data the sample vector of relative frequencies , as well as the variation components and the empirical significance indices . The critical values for the indices in equation (9) are recovered through a Monte Carlo simulation based on at least 10000 trials. The null hypothesis is rejected when the significance index exceeds the critical value at the level of confidence, concluding that a statistically significant effect on the response variable is detected. The calculator is freely available by following the link [27].

2.1.2. Multinomial Ordered Logistic Regression

Multinomial ordered logistic regression (ordered logit) is quite commonly applied in medicine [28], insurance, actuarial, and financial applications [29, 30], as well as to describe consumer purchasing behavior [31]. The ordered logit model can be considered as an extension of the logistic regression model for dichotomous dependent variables, when more than two ordinal response categories are used. It is applied here for analysis of ordinal quality sensory responses vs. chemical composition of a sausage. This model is based on the following concepts.

When Y is an ordinal outcome with K categories, the cumulative probability of responses of categories l less than or equal to a category k is . The odds of responses being less than or equal to a category k can be defined as , for k = 1,…, K − 1. When k = K, the denominator is zero and the odds are not defined. The log odds, called logit, is defined as . The ordinal logistic regression model is parameterized as follows:where βk0 is the intercept (cutoff point) for category k; c1 to cm are the contents (mass fractions) of the main sausage components (continuous latent variables); β1 to βm are the corresponding regression coefficients (slopes), constant across categories. Note that this model is based on the parallel regression (proportional odds) assumption: the logit dependences on sausage compositions are parallel hyperplanes for different categories k, and hence, the intercepts are different for each category but the slopes are constant across categories. The odds of being less than or equal to category k are as follows:

A computer program developed in the R environment for calculation of model parameters, including their confidence intervals and goodness-of-fit measures for the model, is described, for example, at the web page [32]. The following logit format is implemented in the R program as follows:where η = −β for all the regression coefficients of components’ contents c1 to cm. Inverting equation (12), the probability of getting a response of a certain category k or below can be obtained [33] as follows:

The “PoLR” function of the “MASS” package [34] was used to fit multinomial ordered logistic models to the experimental data, and function “predictorEffects” of the “effects” package [35] was used to calculate and plot probabilities of obtaining a response equal to a certain category k.

Goodness-of-fit of these models can be evaluated by calculating one of the several pseudo-R values [32], which estimates the variability in the outcome of the fitted model. For example, McFadden’s pseudo-R2 is defined as follows:where Mfull is a full model with predictors; Mintercept is the model without predictors, i.e., containing the intercept only; and L is the estimated likelihood.

When the Mfull model does not predict the outcome better than the Mintercept model, its ln L(Mfull) is not much greater than ln L(Mintercept); hence, the corresponding ratio is close to 1 and the McFadden’s pseudo-R2 is close to 0: the model has poor predictive value. Conversely, when the Mfull model is good, its ln L(Mfull) is close to zero (since the likelihood value for each observation is close to 1), and McFadden’s pseudo-R2 is close to 1, indicating a successful predictive ability. If comparing two models on the same data, McFadden’s pseudo-R2 would be higher for the model with the greater likelihood.

The “PseudoR2” function of the R package “DescTools” [36] was applied for the corresponding calculations. Note that correlations between contents of the sausage main components may affect the regression coefficients and p values, but they do not influence the predictions, precision of predictions, or the goodness-of-fit statistics [22].

2.2. Experimental Methods

The comparative testing of a sausage as a consumer product [37] was organized by V.M. Gorbatov Federal Research Center for Food Systems, Russia. Samples of boiled-smoked sausage “Moscowskaya” [38] from I = 16 producers for comparison were purchased from a market in 2021, practically simultaneously. This sausage is prepared from beef and pork fat with addition of sodium nitrite curing salt (0.4–0.6 % mass fraction of NaNO2 in NaCl) and spices. Its main chemical components are protein, fat, moisture, and salt.

All samples were examined before their expiration dates (set by the producers) by J = 3 assessors/experts [12] of the Research Center, women of 45–55 years old, each having more than 15 years’ experience in sensory analysis of meat products. Examination of samples was performed in a test room [39] with individual testing cubicles.

Five quality sensory properties of the samples were assessed as follows: (1) appearance and packaging, named here “appearance”; (2) consistency; (3) color and appearance of cut sausage, named here “color”; (4) taste; and (5) smell. An expert response related to each quality property was ordered by K = 5 categories from “very bad” to “excellent” (k = 1, 2,…,5). A total number N = I × J = 48 responses was obtained for each property, and 48 × 5 = 240 responses for the five properties.

Contents (mass fractions) of the m = 4 main components were taken from the certificates of the producers. In total, I × m = 64 continuous quantitative values were obtained.

The data set, including both qualitative and quantitative data (RawData Microsoft Excel file) is available at Zenodo [40].

2.2.1. Methods of Sensory Examination

The experts who participated in the examination of the sausage samples were not informed about quantitative characteristics of the samples’ chemical composition. The quality parameters of a sample, which was one whole packaged sausage, were examined by a standard protocol [25], in the following sequence: (a) appearance–by visual external examination of the packaged sausage; (b) consistency–by pressing with a spatula or fingers on the outside of the sausage, and then, after removing the casing from the sausage and cutting with a sharp knife into thin slices perpendicular to the surface of the product; (c) color–visually; (d) and (e) taste and smell–by pressing and chewing a slice.

2.2.2. Methods of Testing Chemical Composition

The standard Kjeldahl method was used for measurement of protein content c1 [41]. The standard method used for measurement of fat content c2 [42] is based on multiple extractions from a dried sample with a solvent (hexane, diethyl ether, or petroleum ether) in a Soxhlet fat-extraction apparatus. Then, the solvent is removed and the fat dried to constant weight. The standard measurement method for moisture content c3 [43] consists of drying a sample with sand to constant weight at a temperature of (103 ± 2)°C. Salt content c4 was measured by Mohr’s standard titration method [44]. Measurement results are expressed as percent mass fractions. More details including estimation of measurement uncertainties in testing a sausage composition, its specification limits and conformity assessment, are described in the paper [45].

3. Results and Discussion

A flowchart of the data treatment is presented in Figure 1. ORDANOVA starts from calculation of frequencies of expert responses (of different categories) and evaluation of the total variation. The next step is decomposition of the total variation into components with the purpose to assess the effects of two factors influencing the variation: “producer” and “expert.” Another kind of decomposition allows assessment of the abilities of the experts to determine different categories of the same quality property. The components of variation obtained are used for testing hypotheses on the homogeneity of the producers (i.e., the responses to their sausage quality properties) and homogeneity of the experts (their responses to a property of the same sausage sample). When responses of different experts are inhomogeneous, and/or the responses to different sausage producers are homogeneous, the analysis is ended. Otherwise, it is assumed that the difference between responses to the sausage quality of different producers is caused by the differences in the sausage chemical composition, expressed as mass fractions of the main components. This hypothesis is tested with multinomial ordered regression analysis. If any of the regression coefficients are statistically significant (the hypothesis is not rejected), probabilities of obtaining a response related to a specific category for different component contents are calculated. The last step is plotting such probabilities for visualization of the results and their discussion.

3.1. Implementation of Two-Way ORDANOVA without Replication

Frequencies of the responses from the data set are shown in Table 1 by categories (rows) and experts for each quality property of the sausage (columns). All purchased sausage samples were of higher quality than category 2 for any property. Two out of three experts rated the appearance of the whole sausage samples greater than category 3. Also, all experts reported that the sausage consistency was greater than category 3.

Total variation of the responses by equation (3), partitioned into the between-producer variation by equation (4) and the within-producer residual variation by equation (5) are presented in Table 2, which includes the individual effects and of factors and (producers and experts, respectively) evaluated using the decomposition by equation (6). To check the statistical significance of both the factor effects the significance indices and were also calculated by equation (9) at the degrees of freedom  = 15 and  = 2. The critical index values at the 95% level of confidence in Table 2 were obtained using simulations of distributions [27].

There is a statistically significant difference at 95 % level of confidence between the producers related to all the quality parameters of the sausage (appearance, consistency, color, taste, and smell). This difference is called the “inhomogeneity” of the producers. At the same time, the significance index values of the expert factor do not exceed its critical value at the 95 % level of confidence, i.e., the hypotheses on the “homogeneity” of expert responses with regard to each of the five sausage properties are not rejected.

Decomposition of the between producers’ variation into th parts according to the response categories up to by equation (7) and corresponding values are shown in Table 3. Comparing the capability of the participating experts (as a group) to identify different categories, it is important to remember that a greater value of variation indicates a weaker capability to identify the particular category k. This decomposition demonstrates that identification of very bad and poor sausage quality (k = 1 and 2, respectively) was the simplest task for the experts:  = 0 and  = 0 as no expert responses of these categories were obtained for any property. The most complicated part of their examination was to identify a difference between satisfactory and good quality (k = 3 and 4, respectively). The variation up to the last category equals zero by definition of equation (7). However, the number of responses of the category of perfect quality (k = 5) was maximum for each property shown in Table 1, and detection of this category was not a problem for the experts. It is also interesting that the maximum controversial responses here are on the sausage taste and especially on smell. Of course, the results shown in Table 3 do not mean in general that the examined ordinal variables have collapsed into a binary group.

3.2. Implementation of the Multinomial Ordered Logistic Regression

Intervals of the sausage main component contents in the data set, their means, and standard deviations are shown in Table 4.

The multinomial ordered logistic regression model by equation (12) was fitted to the component contents in order to predict appearance, color, taste, and smell, assessed by experts according to the three categories shown in Table 1, k = 3, 4, and 5. A logistic regression for dichotomous (binary) outcome variables was used for prediction of consistency, since the corresponding expert responses (Table 1) were only of two categories, k = 4 and 5. Since for each categorical variable, the responses were found to be homogeneous among the three experts in the ORDANOVA study, all their outcomes were taken together, constituting the set of values to be used in the regression. The calculated results are presented in Table 5, where the estimates for coefficients βk0 and η related to each component content are reported with their standard errors and 95 % confidence intervals (from the 2.5 % to the 97.5 % quantile). The η subscripts from 1 to 4 correspond to the subscripts of the component contents c.

The estimated odds ratios, derived by exponentiating the coefficients, and McFadden’s pseudo-R2 values by equation (14) are also shown in the table. For example, the model by equation (12) of category k = 3 for appearance is logit  = 108.31 − 1.65 c1 − 0.95 c2 − 1.24 c3 + 2.22 c4. A one-unit increase in the protein content c1, for example, corresponds to the increase in the expected value of the logit by 1.65 (on the log odds scale), when all the other variables in the model are held constant. The corresponding odds ratio exp(1.65) = 5.2 indicates that, for every unit increase in the protein content, the odds of a sausage of having a better appearance outcome (k = 4 or 5, versus k = 3) is multiplied 5.2 times.

Note if a confidence interval does not cross zero, the parameter estimate is statistically significant. However, the confidence interval for the estimate η4 = 2.22 (of the regression coefficient of the salt content c4) crosses zero and this means that η4 is statistically not significant in this case. In other words, the salt content values in the interval shown in Table 5 do not influence the probability of the category appearance of a whole sausage.

Probabilities P of obtaining a response of category k by dependence on protein content c1, calculated at the mean values of contents of other main components (Table 4), are shown in Figure 2(a). In the case of three categories of the observed responses (k = 3, 4 and 5), the probability  = ,  =  − , and  = 1 −  − , where and can be evaluated by equation (13). The values for k = 3, 4, and 5 are shown by lines 1, 2, and 3 in Figure 2(a), respectively. The P dependences on fat content c2 and on moisture content c3 are shown in Figures 2(b) and 2(c), respectively. They were also calculated at contents of other main components equal to their observed mean values in Table 4. The influence of all the component contents on the probability values is very similar, but the probability curves vs. contents of protein and moisture are cut at the lower limits of the content intervals (their minimum values observed in the comparison). The full picture is shown in Figure 2(b) for the probabilities vs. contents of fat, where the probability values of the appearance categories vary from zero to the maximum, or from 1 to 0 and vice versa. Note that increasing c1, c2, and c3 leads to an increasing probability of the responses of the highest appearance category k = 5 (excellent quality).

The salt content c4 does not influence probabilities of responses of consistency dichotomous categories k = 4 and 5; hence, it was removed from the list of regressors. The confidence interval of the intercept βk0 in the consistency model in Table 5 crosses zero. However, the intercept closeness to zero does not reflect influence of a component content on the probabilities of consistency categories. Since responses of two categories were obtained, the probability of one of them is the complement to the other, i.e.,  = 1 −  and vice versa. Therefore, only is shown in Figure 3 to illustrate the probability dependence on component contents.

The probabilities of responses of different color categories do not depend on the contents of fat and salt, c2 and c4, in their observed intervals. Model “” without these two variables has the same McFadden’s pseudo-R2 value 0.09 as full model “Color,” and practically, the same regression coefficients for c1 and c3 in Table 5. Probabilities P of responses of different color categories in dependence on content of protein c1 are shown in Figure 4(a), and on moisture content c3 in Figure 4(b).

The full models for taste and smell are the best fitting models among the qualitative sausage properties; their McFadden’s pseudo-R2 values in Table 5 are about two to three times greater than those for appearance, color, and consistency. Dependences of probabilities P of responses of different taste and smell categories on the contents of main components, calculated at contents of other main components equal to their observed mean values, are shown on Figures 5 and 6, respectively.

In general, the maximum probability of responses of each category of taste and smell is reached at increasing contents of the influencing main components. Similar effects are also observed in the plots in Figure 2 for appearance as follows: the first category reaching its maximum probability in the studied ranges of the component contents is 3, then 4, and finally 5, i.e., higher categories are more probable with greater contents of components. However, the salt contents in the interval considered in this study do not significantly influence responses on appearance, color, and consistency. At the same time, taste (Figure 5) and smell (Figure 6) are influenced by the salt contents in a reverse order than contents of other main components; the greater the salt content, the lower category is the more probable.

The probabilities of responses of the excellent quality category of both, taste and smell, increase with contents of protein up to c1 of about 17 %; fat, c2 of about 26 %; and moisture, c3 of about 58 %, while the minimal salt content c4 = 2.2 % is preferable. These estimates could be helpful for a revision of the specification limits of the sausage composition, necessarily taking into account the mass balance constraint; the sum of actual values of mass fractions of the four main components should be equal to 100 % or to a positive fraction less than 100 % [45].

4. Conclusions

A data set of ordinal responses of three experienced experts who assessed five quality properties of samples of a boiled, smoked sausage from sixteen producers was analyzed. The responses were ordered using five categories. Implementation of ORDANOVA allowed decomposition of total variation of the ordinal data and simulation of the multinomial distribution of the relative frequencies of the responses in different categories. A statistically significant difference in quality properties of the sausages from different producers was detected, while the difference between responses of the experts was insignificant.

Capabilities of the experts to identify different categories of the quality properties were also evaluated. It was shown that identification of “very bad” and “poor” quality, as well as “perfect” quality is the simplest task for the experts. The most complicated part of their examination was to identify a difference between “satisfactory” and “good” quality–the closest categories.

Influence of chemical composition of a sausage sample on the probability of a response category was modeled using the multinomial ordered logistic regression of the response category on mass fractions of four main sausage components. Obtained estimates could be helpful for a revision of the specification limits of the sausage composition, as well as for prediction of the product sensory properties when its chemical composition is under quality control.

Data Availability

All relevant data are included in the paper or the data set is available at Zenodo [40]. https://doi.org/10.5281/zenodo.7213008.

Conflicts of Interest

The authors declare that they have no conflicts of interest to disclose.

Acknowledgments

This research was supported in part by the International Union of Pure and Applied Chemistry (Project 2021-017-2-500) and the Ministry of Science and Education of the Russian Federation (Grant Agreement 075-15-2020-775).