cor {stats} | R Documentation |

`var`

, `cov`

and `cor`

compute the variance of `x`

and the covariance or correlation of `x`

and `y`

if these
are vectors. If `x`

and `y`

are matrices then the
covariances (or correlations) between the columns of `x`

and the
columns of `y`

are computed.

`cov2cor`

scales a covariance matrix into the corresponding
correlation matrix *efficiently*.

var(x, y = NULL, na.rm = FALSE, use) cov(x, y = NULL, use = "all.obs", method = c("pearson", "kendall", "spearman")) cor(x, y = NULL, use = "all.obs", method = c("pearson", "kendall", "spearman")) cov2cor(V)

`x` |
a numeric vector, matrix or data frame. |

`y` |
`NULL` (default) or a vector, matrix or data frame with
compatible dimensions to `x` . The default is equivalent to
`y = x` (but more efficient). |

`na.rm` |
logical. Should missing values be removed? |

`use` |
an optional character string giving a
method for computing covariances in the presence
of missing values. This must be (an abbreviation of) one of the strings
`"all.obs"` , `"complete.obs"` or `"pairwise.complete.obs"` . |

`method` |
a character string indicating which correlation
coefficient (or covariance) is to be computed. One of
`"pearson"` (default), `"kendall"` , or `"spearman"` ,
can be abbreviated. |

`V` |
symmetric numeric matrix, usually positive definite such as a covariance matrix. |

For `cov`

and `cor`

one must *either* give a matrix or
data frame for `x`

*or* give both `x`

and `y`

.

`var`

is just another interface to `cov`

, where
`na.rm`

is used to determine the default for `use`

when that
is unspecified. If `na.rm`

is `TRUE`

then the complete
observations (rows) are used (`use = "complete"`

) to compute the
variance. Otherwise (`use = "all"`

), `var`

will give an
error if there are missing values.

If `use`

is `"all.obs"`

, then the presence
of missing observations will produce an error.
If `use`

is `"complete.obs"`

then missing values
are handled by casewise deletion. Finally, if `use`

has the
value `"pairwise.complete.obs"`

then the correlation between
each pair of variables is computed using all complete pairs
of observations on those variables.
This can result in covariance or correlation matrices which are not
positive semidefinite. `"pairwise.complete.obs"`

only works with
the `"pearson"`

method for `cov`

and `var`

.

The denominator *n - 1* is used which gives an unbiased estimator
of the (co)variance for i.i.d. observations.
These functions return `NA`

when there is only one
observation (whereas S-PLUS has been returning `NaN`

), and
fail if `x`

has length zero.

For `cor()`

, if `method`

is `"kendall"`

or
`"spearman"`

, Kendall's *tau* or Spearman's
*rho* statistic is used to estimate a rank-based measure of
association. These are more robust and have been recommended if the
data do not necessarily come from a bivariate normal distribution.

For `cov()`

, a non-Pearson method is unusual but available for
the sake of completeness. Note that `"spearman"`

basically
computes `cor(R(x), R(y))`

(or `cov(.,.)`

) where
`R(u) := rank(u, na.last="keep")`

. In the case of missing values, the
ranks are calculated depending on the value of `use`

, either
based on complete observations, or based on pairwise completeness with
reranking for each pair.

Prior to R 2.1.0, the ranking was done removing only cases that are missing on the variable itself.

Scaling a covariance matrix into a correlation one can be achieved in
many ways, mathematically most appealing by multiplication with a
diagonal matrix from left and right, or more efficiently by using
`sweep(.., FUN = "/")`

twice. The `cov2cor`

function
is even a bit more efficient, and provided mostly for didactical
reasons.

For `r <- cor(*, use = "all.obs")`

, it is now guaranteed that
`all(r <= 1)`

.

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
*The New S Language*.
Wadsworth & Brooks/Cole.

`cor.test`

for confidence intervals (and tests).

`cov.wt`

for *weighted* covariance computation.

`sd`

for standard deviation (vectors).

var(1:10)# 9.166667 var(1:5,1:5)# 2.5 ## Two simple vectors cor(1:10,2:11)# == 1 ## Correlation Matrix of Multivariate sample: (Cl <- cor(longley)) ## Graphical Correlation Matrix: symnum(Cl) # highly correlated ## Spearman's rho and Kendall's tau symnum(clS <- cor(longley, method = "spearman")) symnum(clK <- cor(longley, method = "kendall")) ## How much do they differ? i <- lower.tri(Cl) cor(cbind(P = Cl[i], S = clS[i], K = clK[i])) ## cov2cor() scales a covariance matrix by its diagonal ## to become the correlation matrix. cov2cor # see the function definition {and learn ..} stopifnot(all.equal(Cl, cov2cor(cov(longley))), all.equal(cor(longley, method="kendall"), cov2cor(cov(longley, method="kendall")))) ##--- Missing value treatment: C1 <- cov(swiss) range(eigen(C1, only=TRUE)$val) # 6.19 1921 swM <- swiss swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing" try(cov(swM)) # Error: missing obs... C2 <- cov(swM, use = "complete") range(eigen(C2, only=TRUE)$val) # 6.46 1930 C3 <- cov(swM, use = "pairwise") range(eigen(C3, only=TRUE)$val) # 6.19 1938 (scM <- symnum(cor(swM, method = "kendall", use = "complete"))) ## Kendall's tau doesn't change much: identical symnum codings! identical(scM, symnum(cor(swiss, method = "kendall")))

[Package *stats* version 2.1.0 Index]