chisq.test {stats} | R Documentation |

`chisq.test`

performs chi-squared contingency table tests
and goodness-of-fit tests.

chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000)

`x` |
a vector or matrix. |

`y` |
a vector; ignored if `x` is a matrix. |

`correct` |
a logical indicating whether to apply continuity correction when computing the test statistic. |

`p` |
a vector of probabilities of the same length of `x` .
An error is given if any entry of `p` is negative. |

`rescale.p` |
a logical scalar; if TRUE then `p` is rescaled
(if necessary) to sum to 1. If `rescale.p` is FALSE, and
`p` does not sum to 1, an error is given. |

`simulate.p.value` |
a logical indicating whether to compute p-values by Monte Carlo simulation. |

`B` |
an integer specifying the number of replicates used in the Monte Carlo simulation. |

If `x`

is a matrix with one row or column, or if `x`

is a
vector and `y`

is not given, then a “goodness-of-fit test”
is performed (“`x`

is treated as a one-dimensional
contingency table”). The entries of `x`

must be non-negative
integers. In this case, the hypothesis tested is whether the
population probabilities equal those in `p`

, or are all equal if
`p`

is not given.

If `x`

is a matrix with at least two rows and columns, it is
taken as a two-dimensional contingency table. Again, the entries
of `x`

must be non-negative integers. Otherwise, `x`

and
`y`

must be vectors or factors of the same length; incomplete
cases are removed, the objects are coerced into factor objects, and
the contingency table is computed from these. Then, Pearson's
chi-squared test of the null that the joint distribution of the
cell counts in a 2-dimensional contingency table is the product of
the row and column marginals is performed.

If `simulate.p.value`

is `FALSE`

, the p-value is computed
from the asymptotic chi-squared distribution of the test statistic;
continuity correction is only used in the 2-by-2 case if `correct`

is `TRUE`

. Otherwise, if `simulate.p.value`

is `TRUE`

,
the p-value is computed by Monte Carlo simulation with `B`

replicates.

In the contingency table case this is done by random sampling from the set of all contingency tables with given marginals, and works only if the marginals are positive. (A C translation of the algorithm of Patefield (1981) is used.)

In the goodness-of-fit case this is done by random sampling from
the discrete distribution specified by `p`

, each sample being
of size `n = sum(x)`

. This simulation is done in raw `R`

and is slow.

A list with class `"htest"`

containing the following
components:

`statistic` |
the value the chi-squared test statistic. |

`parameter` |
the degrees of freedom of the approximate
chi-squared distribution of the test statistic, `NA` if the
p-value is computed by Monte Carlo simulation. |

`p.value` |
the p-value for the test. |

`method` |
a character string indicating the type of test performed, and whether Monte Carlo simulation or continuity correction was used. |

`data.name` |
a character string giving the name(s) of the data. |

`observed` |
the observed counts. |

`expected` |
the expected counts under the null hypothesis. |

`residuals` |
the Pearson residuals, ```
(observed - expected)
/ sqrt(expected)
``` . |

Patefield, W. M. (1981)
Algorithm AS159. An efficient method of generating r x c tables
with given row and column totals.
*Applied Statistics* **30**, 91–97.

## Not really a good example chisq.test(InsectSprays$count > 7, InsectSprays$spray) # Prints test summary chisq.test(InsectSprays$count > 7, InsectSprays$spray)$obs # Counts observed chisq.test(InsectSprays$count > 7, InsectSprays$spray)$exp # Counts expected under the null ## Effect of simulating p-values x <- matrix(c(12, 5, 7, 7), nc = 2) chisq.test(x)$p.value # 0.4233 chisq.test(x, simulate.p.value = TRUE, B = 10000)$p.value # around 0.29! ## Testing for population probabilities ## Case A. Tabulated data x <- c(A = 20, B = 15, C = 25) chisq.test(x) chisq.test(as.table(x)) # the same x <- c(89,37,30,28,2) p <- c(40,20,20,15,5) try( chisq.test(x, p = p) # gives an error ) chisq.test(x, p = p, rescale.p = TRUE) # works p <- c(0.40,0.20,0.20,0.19,0.01) # Expected count in category 5 # is 1.86 < 5 ==> chi square approx. chisq.test(x, p = p) # maybe doubtful, but is ok! chisq.test(x, p = p,simulate.p.value = TRUE) ## Case B. Raw data x <- trunc(5 * runif(100)) chisq.test(table(x)) # NOT 'chisq.test(x)'!

[Package *stats* version 2.1.0 Index]