quantile {stats}R Documentation

Sample Quantiles

Description

The generic function quantile produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1.

Usage

quantile(x, ...)

## Default S3 method:
quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE,
         names = TRUE, type = 7, ...)

Arguments

x numeric vectors whose sample quantiles are wanted. Missing values are ignored.
probs numeric vector of probabilities with values in [0,1].
na.rm logical; if true, any NA and NaN's are removed from x before the quantiles are computed.
names logical; if true, the result has a names attribute. Set to FALSE for speedup with many probs.
type an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used.
... further arguments passed to or from other methods.

Details

A vector of length length(probs) is returned; if names = TRUE, it has a names attribute.

NA and NaN values in probs are propagated to the result.

Types

quantile returns estimates of underlying distribution quantiles based on one or two order statistics from the supplied elements in x at probabilities in probs. One of the nine quantile algorithms discussed in Hyndman and Fan (1996), selected by type, is employed.

Sample quantiles of type i are defined by

Q[i](p) = (1 - gamma) x[j] + gamma x[j+1],

where 1 <= i <= 9, (j-m)/n <= p < (j-m+1)/ n, x[j] is the jth order statistic, n is the sample size, and m is a constant determined by the sample quantile type. Here gamma depends on the fractional part of g = np+m-j.

For the continuous sample quantile types (4 through 9), the sample quantiles can be obtained by linear interpolation between the kth order statistic and p(k):

p(k) = (k - alpha) / (n - alpha - beta + 1),

where α and β are constants determined by the type. Further, m = alpha + p(1 - alpha - beta), and gamma = g.

Discontinuous sample quantile types 1, 2, and 3

Type 1
Inverse of empirical distribution function.
Type 2
Similar to type 1 but with averaging at discontinuities.
Type 3
SAS definition: nearest even order statistic.

Continuous sample quantile types 4 through 9

Type 4
p(k) = k / n. That is, linear interpolation of the empirical cdf.
Type 5
p(k) = (k - 0.5) / n. That is a piecewise linear function where the knots are the values midway through the steps of the empirical cdf. This is popular amongst hydrologists.
Type 6
p(k) = k / (n + 1). Thus p(k) = E[F(x[k])]. This is used by Minitab and by SPSS.
Type 7
p(k) = (k - 1) / (n - 1). In this case, p(k) = mode[F(x[k])]. This is used by S.
Type 8
p(k) = (k - 1/3) / (n + 1/3). Then p(k) =~ median[F(x[k])]. The resulting quantile estimates are approximately median-unbiased regardless of the distribution of x.
Type 9
p(k) = (k - 3/8) / (n + 1/4). The resulting quantile estimates are approximately unbiased if x is normally distributed.

Hyndman and Fan (1996) recommend type 8. The default method is type 7, as used by S and by R < 2.0.0.

Author(s)

of the version used in R >= 2.0.0, Ivan Frohne and Rob J Hyndman.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician, 50, 361–365.

See Also

ecdf for empirical distributions of which quantile is the “inverse”; boxplot.stats and fivenum for computing “versions” of quartiles, etc.

Examples

quantile(x <- rnorm(1001))# Extremes & Quartiles by default
quantile(x,  probs=c(.1,.5,1,2,5,10,50, NA)/100)

### Compare different types
p <- c(0.1,0.5,1,2,5,10,50)/100
res <- matrix(as.numeric(NA), 9, 7)
for(type in 1:9) res[type, ] <- y <- quantile(x,  p, type=type)
dimnames(res) <- list(1:9, names(y))
round(res, 3)

[Package stats version 2.1.0 Index]