Abstract

Robustness against the possible occurrence of outlying observations is critical to the performance of a measurement process. Open questions relevant to statistical testing for candidate outliers are reviewed. A novel fuzzy logic approach is developed and exemplified in a metrology context. A simulation procedure is presented and discussed by comparing fuzzy versus probabilistic models.

1. Introduction

Measurement is intrinsically subject to uncertainty: according to the guide [1] ‘‘a measurement has imperfections that give rise to an error in the measurement result.’’ Thus, an accurate statistical analysis is important to optimize the estimation process.

Given a set of measurements, an outlier is an element significantly different from the others (see, e.g., [2] and the standard [3]). The detection of an outlier in a data set can be really important. It can provide important information, like, the discovery of an unforeseen phenomenon, a miscalibration or fault in instrumentation, or a reporting mistake. Furthermore, it is important to take the correct decision about what to do with a candidate outlier: for this reason, different tests for outlier analysis have been studied.

Statistical tests perform screening of a dataset in order to individuate any candidate outliers, using hypothesis testing. Classical tests are the Grubbs [4] and the Dixon ones [5]; other similar tests were formulated, for example, by David and Quesenberry [6], Ferguson [7], and Thompson [8]. All these tests provide a statistic to be compared with a critical value in order to conclude whether the doubtful observation is an outlier or not. The only difference relies on the construction of the statistic and the choice of the critical value. Most of them detect one outlier at a time, so they have to be repeated several times on the screened dataset to detect any further outliers. Different tests, based on similar ideas and hypotheses, like in [9, 10], have been developed for the simultaneous detection of many outliers.

Using a Bayesian approach, some theoretical problems for the above tests are highlighted in this paper: to cope with such problems, an outlier analysis based on fuzzy logic is proposed, and a fuzzy treatment procedure is developed. Some works of related interest are [11] (general theory and application of fuzzy sets and systems), [12] (evidence theory with application to uncertainty treatment), [1317] (fuzzy treatments of uncertainty in diverse fields, including measurement and temporal information), [18, 19] (Bayesian approach to outliers processing).

The paper is organized as follows. In Section 2, some statistical tests for the detection of outliers in a set of observations are recalled. A review of these outlier tests is provided, focusing on Grubbs and Dixon ones. Moreover, some numerical examples are provided, and a criticism to orthodox hypothesis testing is considered from a Bayesian point of view. In Section 3, a novel outlier treatment based on fuzzy logic is developed with a simulation procedure implemented in MatlabTM. Strategy and implementation aspects are detailed in Section 3.1. An application is reported in Section 3.2, where also the procedure performance is discussed by comparing fuzzy versus probabilistic models. Finally, in Section 4 the inference system architecture is recapitulated, and concluding remarks are pointed out.

2. Statistical Tools: State of the Art and Open Questions

According to standard [20], an outlier is ‘‘an observation that appears to deviate markedly in value from other members of the sample in which it appears.’’ To process outliers, diverse tests have been originated and developed within the classical (Neyman-Pearson) hypothesis testing framework: a statistic is compared with a critical value related to a significance level , leading to rejection of the null hypothesis ‘‘the observation is not an outlier’’ if the statistic exceeds the critical value. Among the others, the Grubbs and Dixon tests can be considered paradigmatic examples (see standard [20]). They are used to screen sampled datasets, aiming at detecting possible outliers one by one. Tests for simultaneous detection of more outliers at a time were also proposed (e.g., [9]); however, in these tests the exact number of suspected outliers must be specified in advance, or at last an upper bound to this number must be known [10].

The Grubbs and Dixon tests are here briefly recalled, and a few examples are elaborated in order to discuss some relevant features with application to a Gaussian distributed sample, .

A one-sided test looks for candidate outliers on one side only of the ordered dataset (i.e., either for maximum or for minimum), whereas in a two-sided test the dataset is screened on both sides. Focusing on the two-sided Grubbs test [4], the following statistic is compared with a critical value obtained from the Students -distribution: where and are the sample mean and standard deviation, respectively, and is the value that maximizes over the data_set. If , then is considered an outlier at the related significance level .

Given the above data_set, a Dixon test [5] yields the statistic where is the candidate outlier, the observation closest to , and , are the maximum and the minimum respectively among the ’s. The statistic is confronted with a critical value that can be found in a table [21], where critical values originally calculated by Dixon [22] are corrected by use of interpolation analysis.

Whereas the Grubbs test is not limited to small samples, the original examples presented by Grubbs [4] pertain to samples of sizes 15 and 8. Moreover, the Dixon criterion is used on samples of small size only (see the original work of Dixon [5]). Consequently, case-studies focused on datasets of size 10 are apt at highlighting features and issues peculiar to this family of classical statistics tests.

Example 1. Let the sampled observations be and let the Grubbs and Dixon tests be mutually compared on the same observed value , both at the significance level , for the null hypothesis . By applying the Grubbs test, at the critical value is . For the observation , : thus is rejected, and is considered an outlier to be removed from the data_set_1. On the contrary, by applying the Dixon test, at the critical value is , thus ; therefore, cannot be rejected: in this case the same value is not considered an outlier.
The two tests—even though both correctly performed in the same testing conditions—lead to divergent decisions. Moreover, if the decision is to discard the detected outlier from the dataset in view of further statistical processing, the surviving data cannot be considered a random sample of mutually independent observations: all of them are in fact associated by having passed the same selection criteria.
The Grubbs and Dixon tests can be repeated in order to detect more than one outlier—if any—in a given dataset. However, as shown in the following example where data are Grubbs-tested for multiple outliers, a test may happen to be unstable. This is a consequence of the fact that the statistic in (1) is a function of and of the sample parameters mean and standard deviation : these parameters are subject to change with exclusion/inclusion of individual values.

Example 2. Let the observed sample be At the level , one single outlying value is detected. However, if the same value appears twice, that is, if the dataset is altered into would not be considered an outlier (the initial inequality is reversed after duplication of the individual value ).
The conclusion is that drastic rejection or acceptance of a suspected outlier is not the best decision-making criterion: a better posed criterion might be to assign the suspected value a weight for further processing purpose. This is the idea developed in the next section by using fuzzy logics. Before moving to fuzzy logics, it should be noted that the classical Neyman-Pearson approach to hypothesis testing is challenged by Bayesian statistics. The probability, conditioned on the involved dataset, for an observation to be a candidate outlier can be dramatically different from the probability referred to in the family of classical tests (see, e.g, [19] for a numerical example illustrating how classical hypothesis testing can be prone to misinterpretation and misuse).
According to the Bayesian approach, the test is formalized in terms of inverse probability. Thus, the posterior probability of the hypothesis after the data have been observed is obtained by means of the Bayes rule. To develop a Bayesian model for candidate outliers testing, let the propositions (the so-called null hypothesis) and (alternative hypothesis) be two mutually exclusive and exhaustive hypotheses under test, namely, “the observation is not an outlier;” “the observation is an outlier.” Let the proposition “the test result is positive for a suspected outlier,” represent the available evidence.
The conditional probabilities and represent the test size (level of significance) and the power of the test, respectively. In tests based on orthodox statistics, two types of errors may occur in testing for outliers: an outlier may be missed with a probability (type I error), or a false detection may occur with a probability (type II error). A Bayesian approach to outlier testing is instead focused on computing the posterior probability of an observation being an outlier () given the test result is positive for a suspected outlier (), that is, . This can be computed as follows.
In terms of propositional calculus, let and represent two propositional variables. Let , , and denote logical conjunction (‘‘ and ’’), disjunction (‘‘ or ’’), and negation (‘‘not ’’), respectively. Noting that , can thus be partitioned into . Application of logical connectives to the probability function yields ; finally, using the Bayes rule:
The standard for dealing with outliers [20] remarks that rejection of aberrant observations should relay preferably upon physical—rather than statistical—grounds. On the base of such a remark, the treatment of possible outliers can be developed from a fuzzy logic standpoint, aimed at capturing physical grounds and related hypotheses by means of fuzzy processing tools. The proposed approach is developed starting from the consideration that the propositions ‘‘the observation is an outlier’’, and its negation can be modelled in fuzzy logic terms by assigning a truth degree, varying from zero (complete falsehood) to unity (full truth): the probability measure can thus be replaced by a purpose-built—according to the strategy presented in Section 3.1—fuzzy outlierness degree.
In terms of fuzzy sets, the logical connectives of conjunction, disjunction and negation are translated by fuzzy set-theoretic operations of intersection, union, and complementation, respectively. Using a standard model, originated by Zadeh [23] and further elaborated by Mamdani and Assilian [24], the fuzzy inference engine used in the present case-study is detailed in the next section.

3. Fuzzy Treatment

3.1. Strategy and Implementation

To tackle the above open questions, an alternative treatment to candidate outlier detection and processing is here developed in the framework of fuzzy logic. Fuzzy logic is integrated in the framework of the possibility theory (see, e.g., [12]), where a counterpart of the Bayes rule can be derived [25]. Thus, a fuzzy logic treatment for outliers is not prone to Bayesian criticisms, unlike tests based on orthodox statistics.

To the purposes of present work, such a treatment is based on classical fuzzy logic rules, as introduced by Zadeh [23]. Fuzzy logic potential as a paradigm for uncertainty treatment in measurement is studied in various works, such as [1217, 26].

The focus here will be on criteria for transforming the outlier problem into fuzzy terms.

First of all, a definition of candidate outlying observations must be stated. According to the current use in technical literature (for distance-based approaches see, e.g., [27, 28]), the following definition is considered: an observation is a candidate outlier if its distance from a predefined reference value exceeds a given threshold. In fact, this definition makes explicit the assumptions underlying classical tests—see (1) and (2).

In these terms, the reference value and the threshold value are defined as the mean and a multiple (an integer value to be chosen for implementation) of the standard deviation respectively of a Gaussian probability density function (pdf).

Denoting by the pdf of conditioned to , the Bayes formula is , where is the posterior pdf, the prior pdf, and the likelihood function. Putting , the mean and the standard deviation are identified by and , respectively. Note that and are parameters whose values must be set after an expert judgment (in metrology terms, is named type B uncertainty estimation [1]), to initialize the algorithm. The values of and are required to specify the prior pdf , so they must be preset before starting the measurement process.

To design a suitable fuzzy strategy, some steps are required, so to introduce the notion of a fuzzy degree qualifying an observation to be a candidate outlier: for short, outlierness degree . The strategy can be detailed by introducing the distance , and the percentage expert’s estimate of uncertainty, expressed by (the case is not covered in this approach). For instance, by putting the outlierness degree of a single observation can be computed according to the following inference scheme that includes two inputs (fuzzy distance and uncertainty) and one output (outlierness).

The fuzzy distance is obtained after a fuzzification of the distance , according to:(i)if , then distance is verylong;(ii)if , then distance is long;(iii)if , then distance is medium;(iv)if , then distance is short.

The fuzzy uncertainty is obtained as follows:(v)if , then uncertainty is ample;(vi)if , then uncertainty is moderate;(vii)if , then uncertainty is narrow.

As to outlierness:(viii)if , then outlierness is high;(ix)if , then outlierness is intermediate;(x)if , then outlierness is low.

The inference system is based on the following ten rules:(R1.) if (distance is short) and (uncertainty is narrow), then (outlierness is low);(R2.) if (distance is short) and (uncertainty is moderate), then (outlierness is not high);(R3.) if (distance is short) and (uncertainty is ample), then (outlierness is intermediate);(R4.) if (distance is medium) and (uncertainty is narrow), then (outlierness is not high);(R5.) if (distance is medium) and (uncertainty is moderate), then (outlierness is intermediate);(R6.) if (distance is medium) and (uncertainty is ample), then (outlierness is not low);(R7.) if (distance is long) and (uncertainty is narrow), then (outlierness is intermediate);(R8.) if (distance is long) and (uncertainty is moderate), then (outlierness is not low);(R9.) if (distance is long) and (uncertainty is ample), then (outlierness is high)(R10.) if (distance is very long) then (outlierness is high).

The inference engine is the basic Mamdani model [24] (with if-then rules, minimax set-operations, sum for composition of activated rules, and defuzzification based on the centroid method) available from the MatlabTM fuzzy logics toolkit (Identification of commercial products in this paper does not imply recommendation or endorsement, nor does it imply that the products identified are necessarily the best available for the purpose.) Here, the fuzzification is detailed in terms of fuzzy distance, fuzzy uncertainty, and outlierness. The membership functions (depicted by triangular or trapezoidal shapes in Figure 1) reflect expert-based choices, after selection from an interactive menu purposely implemented.

The Mamdani model is congenial to capture and to code expert-based knowledge in view of performing targeted simulations; accordingly the system’s performance is tuned using heuristic criteria: Figures 1 and 2 illustrate its typical behaviour.

The outlierness degree is obtained by

application of the centroid defuzzification method. This provides the abscissa of the barycentre of the fuzzy set composed according to the activated rules. The overall functioning of the rules is summarized in the 3D graph in Figure 2.

3.2. Application and Discussion

The resulting is used to determine a weight entered in processing the data set for estimation purpose. Each individual value in the data set is assigned a weight , whose assignment rules are(w1) if , then (fully outlier); (w2) if , then (fully inlier); (w3) otherwise (fuzzy outlier).

In this way any fuzzy outlier, being not discarded from the data set, still contributes with its own weight to the final estimated value.

To assess the performance of the fuzzy treatment compared to Grubbs and Dixon tests, a numerical example is reported with reference to the data_set_1 of Example 1. According to Grubbs test, the suspected value is an outlier, on the contrary according to Dixon test the same suspected value is not an outlier. Such a disagreement is successfully managed by the fuzzy treatment, which assigns an outlierness degree.

The final result of the fuzzy procedure is influenced by values assigned to parameters and . For example, putting and , the fuzzy procedure performance is shown in Figure 3 with application to the candidate . The candidate results a fuzzy outlier with outlierness degree and, according to assignment rule (w3), it is assigned the weight .

The efficacy of this fuzzy treatment is supported by another example developed with application to the candidate outlier taken from data_set_2 of Example 2. Here, Grubbs test and Dixon test yield mutually contradictory results (Grubbs test detects the outlier ). Moreover, introducing an extra candidate outlier in the data set, a failure of the Grubbs test has been noted. Figure 4 shows how this candidate is detected and assigned its outlierness degree : in this case , and .

4. Conclusion

The presence of suspected outlying values in measurements has given rise to a long-standing problem. Its difficulty is mainly due to the lack of sharp criteria for outlier detection and treatment in an estimation process. The classical statistical approach to candidate outlier detection and treatment has been reviewed, highlighting some problems that have been discussed at a logical level. To overcome some of these problems, a novel fuzzy logic approach has been proposed and a system has been implemented. The system performance has been tuned by simulations: optimization and integration for perspective in-process metrology is envisaged for further developments.

The notion of a fuzzy outlier is introduced and specified in terms of an outlierness degree founded on metrological rather than statistical grounds (as suggested by the standard [20]). Such a degree is computed as the result of a 2-input/1-output fuzzy inference system. A Bayesian estimation process is referred to in the designed strategy. The expert-based estimates of the mean and of the standard deviation of the prior pdf in the Bayes rule, are used to initialize the process. Independence is not required in Bayes rule and the fuzzy treatment of data is not affected by statistical independence. Thus, whereas preservation of independence may be a problem for orthodox statistical tests, it is not for the proposed treatment of outliers. Fuzzifications of a candidate outlier’s distance from the mean value and of the standard deviation provide the inputs to the fuzzy inference system. The outlierness degree is obtained by centroid defuzzification.

In the light of the results of the research work presented and discussed so far, the following conclusions can be pointed out:(i)compared to orthodox hypothesis testing for outliers, such as Grubbs and Dixon tests, the developed fuzzy approach is not prone to criticisms raised by Bayesian statistics;(ii)the outlierness degree can be conveniently translated into a relative weight assigned to an outlier entering an estimation process;(iii)the efficacy of the proposed fuzzy inference system has been demonstrated on heuristic grounds, with successful management of case-studies, where orthodox tests would lead to mutually divergent decision-making.