Journal of Probability and Statistics

Volume 2014 (2014), Article ID 326579, 6 pages

http://dx.doi.org/10.1155/2014/326579

## Defining Sample Quantiles by the True Rank Probability

^{1}VTT Technical Research Centre of Finland, 02044 Espoo, Finland^{2}Berakon, Espoo, Finland

Received 30 June 2014; Revised 10 November 2014; Accepted 11 November 2014; Published 8 December 2014

Academic Editor: Z. D. Bai

Copyright © 2014 Lasse Makkonen and Matti Pajari. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Many definitions exist for sample quantiles and are included in statistical software. The need to adopt a standard definition of sample quantiles has been recognized and different definitions have been compared in terms of satisfying some desirable properties, but no consensus has been found. We outline here that comparisons of the sample quantile definitions are irrelevant because the probabilities associated with order-ranked sample values are known exactly. Accordingly, the standard definition for sample quantiles should be based on the true rank probabilities. We show that this allows more accurate inference of the tails of the distribution, and thus improves estimation of the probability of extreme events.

#### 1. Introduction

The quantile of a continuous, strictly monotonous distribution function is defined as where is the probability of nonexceedance of a variable value. When the distribution is unknown, sample quantiles provide estimators of their population counterparts based on a set of independent order-ranked observations . The associated sample probabilities are then , where is the probability of a new sampled value being less than or equal to . These nonexceedance probabilities are those defining the cumulative distribution function (CDF).

Many different formulas for defining sample quantiles have been used in literature and statistical software. This has caused considerable confusion, in particular when performing extreme value analysis for various applications where probabilities of rare events need to be estimated. In a widely cited article Hyndman and Fan [1] identified this problem and emphasized that there is a need to adopt a standard definition for sample quantiles. The same problem was discussed again by Langford [2] who identified twelve different sample quantile definitions that are used in statistical software.

Hyndman and Fan [1] analysed nine different sample quantile definitions. They selected six “desirable properties” for an estimator of a sample quantile and considered how well different definitions satisfy them. This approach is similar to judging the plotting position estimators by five “postulates” as done by Gumbel [3] and by three “purposes” by Kimball [4]. Hyndman and Fan [1] proposed to be used as the basis of the standard definition.

However, the definition of the quantile has not yet been standardized. Modern statistical software, such as* Matlab*,* Excel*,* SciPy*,* STATA*, , and , include different definitions and offer user-selected options for the formulation of the quantile function, as well as for plotting positions in quantile plots and quantile-quantile plots; see, for example, Castillo-Gutiérrez et al. [5]. The inability to agree on a standard definition has arisen from the many proposals [6–8] and the subjective nature of the “criteria” and “desired properties.”

Since the quantile function is the reverse of the cumulative distribution function, the quality of its definition must be judged by how close the probabilities defined by it are to the true probabilities of the cumulative distribution function. Thus, the definition of a sample quantile function should be based on the true nonexceedance probabilities. It was pointed out by Makkonen [9] that, for order-ranked data, they are known exactly. We outline here two rigorous proofs of this conclusion and show how the appropriate definition for the sample quantile function follows from it.

#### 2. Sample Probabilities

We present in the following two deductions of the probability .

Consider in Figure 1 an order-ranked sample (a) of random observations (white circles) and a new observation (grey circle) sampled randomly from the population the distribution of which is unknown. In the new sample (b) obtained by including the new observation the new value may fall in any interval of the original sample or be smaller than or larger than . In the sample, each observation has the same probability to be the smallest one. In particular, .